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ESTIMATION AND TESTS OF SIGNIFICANCE IN FACTOR 
ANALYSIS 


C. RADHAKRISHNA Rao 
VISITING RESEARCH PROFESSOR, UNIVERSITY OF ILLINOIS 


A distinction is drawn between the method of principal components 
developed by Hotelling and the common factor analysis discussed in psycho- 
logical literature both from the point of view of stochastic models involved 
and problems of statistical inference. The appropriate statistical techniques 
are briefly reviewed in the first case and detailed in the second. A new method 
of analysis called the canonical factor analysis, explaining the correlations 
between rather than the variances of the measurements, is developed. This 
analysis furnishes one out of a number of possible solutions to the maximum 
likelihood equations of Lawley. It admits an iterative procedure for esti- 
mating the factor loadings and also for constructing the likelihood criterion 
useful in testing a specified hypothesis on the number of factors and in 
determining a lower confidence limit to the number of factors. 


1. Introduction 


Whatever may be the arguments for or against factor analysis as a tool 
in psychological research, the statistical problems it involves have been of 
considerable interest to the statistician mainly because of their complexity. 
Two important contributions on the statistical side are by Hotelling (8), who 
introduced the principal component analysis, and Lawley (11, 12), who pro- 
vided a test criterion for judging the significance of factors in addition to 
working out the maximum-likelihood equations of estimation. These two 
authors were, however, considering two different problems, both of which 
seem to have important application. They are sometimes considered as two 
possible formulations of the same problem providing the same answer. In 
theory it helps to make a distinction between the two. The term principal 
component analysis (PCA) should be used for Hotelling’s formulation of the 
problem and its solution; the term factor analysis should be used for the 
specialized formulation considered in psychological literature and for the 
various solutions offered (see also 10). Lawley was considering the latter 
problem under the assumption that the variables (test scores) are normally 
distributed. 

Illustrations have appeared from time to time to show that PCA gives 
nearly the same relative magnitudes of factor loadings as any effective 
method of factor analysis. This is true only when what have been termed as 
communalities are very nearly equal for all the tests as shown in section 3.1 
of this paper. 
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The PCA is sometimes modified (3, p. 114; 7) by the insertion of com- 
munalities in the diagonal of the correlation matrix. This method, called the 
principal factor analysis (PFA), seems to provide a valid approach to the 
problem of factor analysis; however, it carries with it the flavor of principal 
component analysis intended to explain the variations in the standardized 
scores. It will be seen that an alternative approach developed in section 3.2 
explains most effectively the correlations between the test scores in a battery. 
This method may be called a canonical factor analysis (CFA). Formulas for 
estimation are detailed in section 4. 

The tests of significance associated with component analysis and factor 
analysis also differ to some extent. In the former case interest chiefly lies in 
the magnitude of, or the relationship between, the latent roots of the hypo- 
thetical matrix of raw correlations or those corrected for attenuation. In 
factor analysis it is the decomposition of the correlation matrix as the sum 
of a diagonal matrix and a positive semi-definite matrix. The differences in 
nature of these tests are sometimes fundamental (section 2.2). The tests of 
component analysis are contained in Hotelling’s paper (8), and an appropriate 
test for factor analysis is given by Lawley (11). An alternative form of 
Lawley’s test yielding slightly more precise results is given in section 4.2. It 
is also shown (section 4.3) that the test criterion can be calculated during 
the process of estimation and used in obtaining a lower confidence limit to 
the number of factors. 

Recently Bartlett (1, 2) proposed a test involving the latent roots of 
the correlation matrix intended to study ‘‘the correlation structure in relation 
to the variance of the measurements.”’ The exact nature of the hypothesis 
for which Bartlett’s test is applicable and the conditions under which it is 
valid are examined in section 2.2. It appears that this test does not provide 
a complete answer to either form of analysis under consideration. 

For a full account of tests of significance in factor analysis developed 
up to 1952, the reader is referred to Burt (3). 

The author of this article is not concerned here in examining which of 
the methods, component or factor analysis, is relevant in problems of psycho- 
logical research or whether both methods provide rather similar numerical 
results (not identical in general) leading to the same psychological interpre- 
tation. The main emphasis is on the differences in statistical techniques 
needed in these two cases and a detailed examination of the methods for 
factor analysis. 


2. Problems of Factor and Component Analyses 


2.1 Factor Analysis 


Factor analysis postulates an underlying structure of a set of measure- 
ments in terms of hypothetical variables (non-observable) depending on 
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what are called common and specific or individual factors. If x, , --- , 2, 
denote p different measurements on an individual, then zx; is written 
z; =2z,+8; (¢ = i, eee Bs (2.1.1) 


where z,; , the variables depending on common factors, and s; , the variables 
depending on specific factors, satisfy the following conditions of zero 
covariance: 


cov (2; ,8:) = 0, cov(z;,s;) =0, cov(s;,s) =O (#)). (2.1.2) 


Sometimes, another independent variable representing unreliability in the 
measurement x; is added to (z; + s;), but for purposes of factor analysis 
based on unrepeated test scores of individuals, this variable can be combined 
‘ with s; without loss of generality. If such repeated test scores are available, 
then a more comprehensive analysis of the common and specific factors is 
possible. This latter analysis is not considered here. 

From the structural setup (2.1.1), (2.1.2) it follows 


V(x.) = Ve) + V(si) 


Oi = Va + Oy. 
cov (2; , 2;) = cov (2; , 2;) + cov (z; , ;) + cov (s; , z;) + cov (s; , 8,) 
= cov (z; , 2;) 
Oi; = Vii (¢ ¥ j). 
If 2, I’, A denote the dispersion matrices of the vector variables z, z, s, then 
ZT=T+A, 


where A is a diagonal matrix. 

It is seen from the above analysis that any correlation between x; , x; is 
solely due to the correlation between z; , z; . What we can actually observe 
are the values of the variables xg on a group of individuals but not z, s which 
are not operationally defined, but whose hypothetical existence is postulated. 
We thus obtain an estimate of the matrix 2. The subject of factor analysis 
is mainly concerned with the estimation of the matrix I starting with an 
estimate of 2. The object is not to find any matrix I satisfying the condition 
~ = I + A but the one which has the least complexity leading to a par- 
simonious description of the relationships between the observable variables 
z. The complexity, when defined as the rank of the matrix I, has a special 
significance for the problems on which this technique is applied, as shown in 
the subsequent sections of this paper. 

Some of the statistical problems of factor analysis are: 


(a) to estimate the minimum rank of the dispersion matrix (variances 
and covariances) I of the variables z, , --- , z, occurring in the structural 
equations (2.1.1, 2.1.2), 
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(b) to test any hypothesis specifying the minimum rank of TI, 

(c) to estimate a basis of the common factor space (defined below), 

(d) to predict the value of any common factor from the observed set 
21, °°: , £, for any individual. 


The statement that the rank of T is k < p implies that the variables 
2, °** , 2, can be expressed as linear combinations of k independent variables 
only. To bring out the precise meaning of such a dependence let us consider 
the entire vector space of elements consisting of all linear combinations of the 
set z, , --: , 2, of variables introduced into the different tests of a battery 
with the restriction that any two variables differing by a constant represent 
the same element. We may call any element of this space a common factor 
variable or simply a factor variable unless otherwise specified. The vector 
product of any two elements f and g of this space is defined by cov(f, g) and 
the square of the norm of f by variance of f, V(f). 

Vectors f, , f2, -:: , f, of this space are said to be independent if no 
linear combination a,f, + af. + --- + a,f, (all a; ¥ O simultaneously) 
has zero norm. A vector space is said to be finite dimensional if all its elements 
can be expressed as linear combinations of a finite number of elements. In 
such a case there is a minimal number of such elements called the rank of the 
space. A set of such elements necessarily independent (to be minimal) is 
called a basis. A basis of a vector space is not, however, unique but its rank is. 

We can always choose a basis such that its elements are orthogonal 
(zero vector product), implying that the chosen factor variables are un- 
correlated. A convenience provided by such a choice is that a basis can be 
simply represented by a set of correlations between the measurements and 
factors. If Z, , --- , Z, is an orthogonal basis then each z,; can be expressed in 
terms of Z; . 


4 = QZ, + RP. + A125 . (2.1.3) 


The covariance of x; with Z; is a;; , the coefficient of Z; in the representation 
(2.1.3). This may be regarded as a correlation coefficient once x; and Z; are 
properly standardized. A basic set of factors or equations (2.1.3) can be 
represented by the matrix of correlations 


| Ait oe a 
| - (2.1.4) 


| Ani oy ce Aor 


which is also called the factor loading matrix. Such a basis is not unique 
because the choice of Z; , or the representation (2.1.3), is not unique. A basis 
is just meant to generate the entire space of factor variables which are linear 
combinations of some hypothetical variables introduced into the various 
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tests. From this point of view one basis is as good as any other. Of course, 
given one basis, orthogonal or oblique, the other can be derived by a linear 
transformation. 

The choice of a suitable basis which is ‘psychologically meaningful” 
has largely rested with the psychologist, perhaps rightly so. But it is quite 
conceivable, once the psychological meaning is translated to mean some 
precisely stated restrictions on the basis, that its choice will turn out to be a 
problem of statistical estimation. In this sense the graphical methods of 
rotation of factor loadings advocated by Thurstone (17) and the quadrimax 
method of Neuhaus and Wrigley (13) are statistical methods of factor analysis, 
where the number of zero or small loadings is maximized. Such a restriction, 
or even the orthogonality of a basis, may not be the most helpful in leading 
to a suitable psychological interpretation or the discovery of “real entities.” 
The choice of the restrictions to be imposed on the basis is perhaps a problem 
for psychological research. An objective method developed in this connection 
by Cattell (4, 5) seems to have interesting possibilities from a statistical 
point of view. 


2.2. Principal Components 


It was pointed out by Sir Cyril Burt that this method was originally 
put forward by Karl Pearson in 1901. But the statistical problems of estima- 
tion and testing connected with the principal components were first con- 
sidered by Hotelling. 

Hotelling (8) considers two types of problems: 

First, without assuming a decomposition of the measurements as in 
(2.1.1), hypotheses are framed in terms of the latent roots of the correlation 
matrix with a view to studying the shape of the scatter of the standardized 
scores in the p-dimensional space or alternatively the relative importance of 
the different principal components in explaining the total variance. For 
instance, if some of the calculated roots are not significantly different then 
the components corresponding to them may be considered equally important. 

Let us consider a specific hypothesis that the (¢ + 1)th to the (¢ + r)th 
roots of the population correlation matrix (denoted by p) are equal. The value 
of i may be 0, 1, --- to (p — 7). 

This hypothesis imposes a restriction on p, viz., that it admits the 
decomposition 


p= 6+NI, (2.2.1) 


where @ is a matrix of rank (p — r), \ is the common value of the roots from 
(¢ + 1)th to (¢ + r)th, and J is the unit matrix. 

Any such hypothesis or a similar one based on the dispersion matrix 
> instead of p can be tested by a likelihood-ratio criterion A, provided sample 
size is moderately large. Exactly how large the sample size should be is a 
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matter for further investigation. The statistic (— 2 log, A) is distributed in 
large samples as x” with degrees of freedom equal to the number of restrictions 
on the free parameters imposed by the hypothesis. 

The number of restrictions in a hypothesis of the form (2.2.1) is equal 
to the number of restrictions on the symmetric mairix 6 with rank (p — r) 
minus one for the unknown i. Only (p — r) rows and columns in 6 are in- 
dependent; the rest of the elements depend on them. Therefore, the number 
of restrictions on the elements of 6 is 


(p — rp —7r + 1)/2, 


and with one less we have 


G-r- Dp -r+3/72 (2.2.2) 
degrees of freedom for the x’ approximation. 

If A is the estimated dispersion matrix from observations on n indi- 
viduals, then the test criterion for the hypothesis (2.2.1) with = instead of 
pis 
| A | 
| 2 | 
where | = | is the estimated dispersion matrix under the conditions of the 
hypothesis. The latent roots of > 





—(n — 1) log (2.2.3) 


Mi» Ma 5 °°°* 5 Mi 5 Mita y °°° » Mitr y Mitr41 » “°° 9 Mp 


are connected with the latent roots of A in the following way: 











Ay = Bi (j= 1,-+-,t,¢+rt+1,---,p), 
a Nias + pine + As, 
Mink = . 
r 
Since 
|A,=Ar--:A,, [Sl =m- mw, 
the ratio | A |/| = | is 
Nias Seals Nite -, (2.2.4) 
(Aen +::- FAs) 
T 


which is a suitable power of the ratio of the geometric to the arithmetic mean 
of the (¢ + 1)th to (¢ + r)th roots of A. From this point of view it would 
appear, by choosing 7 = (p — r) in (2.2.4), that Bartlett’s (1) test using the 
dispersion matrix instead of the correlations is valid for judging the significance 
of equality of the least r roots. 
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Unfortunately the test does not seem to reduce to the form (2.2.4) in 
terms of the roots of the observed correlation matrix R when the hypothesis 
is as stated in (2.2.1) in terms of the population correlation matrix p. The 
effect of standardizing the variables by the sample standard deviations is not 
properly allowed for by a criterion of the form (2.2.4). This is also partly 
revealed by Bartlett’s own evaluation of the degrees of freedom by the ex- 
pectation method in a simple case. They depend on the unknown correlations 
and reach the value (2.2.2) only in a limiting case, while for a genuine likeli- 
hood ratio this is not expected. The exact evaluation of the test criterion 
depends on complicated equations which require further investigation. 

Secondly, Hotelling considers the problem of “testing the variances of 
components against the variance to be expected on account of the inaccuracy 
of the tests as revealed by their self-correlations or reliability coefficients.” 
For this purpose a test score is thought of as made up of two parts, a true 
score with variance unity and a random error. Thus 


i, = XS; -L €; (i = i nine ,P); (2.2.5) 


with the conditions cov(e; , ¢;) = 0, (¢ ¥ j). The hypothesis stated above is 
interpreted to imply that the true scores X; are linearly dependent, i.e., 
“the scatter diagram of the true scores will lie in a flat space of smaller 
dimensionality immersed in the p-dimensional space.’”’ If independent esti- 
mates of the variances of ¢; are available, either from an external source or 
by re-tests on individuals, there is no need to consider the true scores as 
random variables in order to test the above hypothesis. The general multi- 
variate tests of dimensionality developed in more complicated situations 
are directly applicable for this problem. The non-stochastic model on the 
scores corrected for unreliabilities used in testing the second hypothesis 
provides a strong contrast to tests in factor analysis where, of necessity, all 
the variables (the common and specific factors) involved are considered to 
be stochastic, which makes the problem more complex. 


3. Special Characterizations of a Basis in Factor Analysis 
Using the vector notation 
= (i, °+ MH), 2, --+ 5%), 8 = (Bi, °°* , &). 
The equation (2.1.1) can be written 
z=z+3. (3.1) 
The dispersion matrix of x (using D for dispersion) is 


D(z) = D2) + Dis), 


or 
Z=T+A, (3.2) 
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where 2, I and A are defined by equation (3.2). The covariance of z and ¢ is 
zero because of conditions (2.1.2). The matrix T is positive semi-definite with 
rank k < p and A is a positive-definite diagonal matrix. The equation (3.2) 
supplies the fundamental decomposition of the dispersion matrix 2 in terms 
of those of the hypothetical variables postulated by a factorial structure. If 
the rank of [is k < p, the space of common factors has a basis of k inde- 
pendent factors as shown in section 2.1. For a proper identification of the 
space and “an orderly selection of independent factors’’ there is a need to 
characterize a basis in a convenient way. A basis so characterized need not 
admit a psychological interpretation, for only mathematical and statistical 
convenience is being sought at this stage. A basis once obtained can always 
be transformed to meet other requirements. Two special characterizations 
are discussed here. 


3.1 First Characterization 


Let 1 = (1, , --- , l,) be a vector of arbitrary coefficients giving rise to a 
new factor variable 


i’ =lea+t--- +z. 
The variation in the variable x; explained by the factor variable Iz’, is 


cov" (x; , lz’) =, (livii + rie + Lys) 





V(lz’) Tl 


assuming that / T /’, the variance of Jz’, is not zero. The total variation ex- 
plained in all the variables is 


>> (lini: + es + Loi) \ - rl’ 


irl’ rl’ 


(3.1.1) 


(3.1.2) 





Let us choose / such that (3.1.2) is a maximum. Differentiating with respect 
to the vector | (see 14, p. 21), the equation leading to the optimum value \ 
of the ratio (3.1.2) and the vector / T is 


irr — xr =0 


or eliminating /T 


|r—Al| =0, (3.1.3) 
where / is the identity matrix. This shows that \ is the maximum latent 
root of T and m = IT is the latent vector corresponding to it. Since the 


vector m satisfies the equation 
mY = Xm, 
and 
m= IT, 























C. RADHAKRISHNA RAO 101 


so that 

or = XT, (3.1.4) 
the vector m itself can be taken to be a solution of J. We thus obtain the first 
factor variable as a linear combination of z, , --- , z, . From the theory of 


canonical roots and vectors (14, p. 24), it would then follow that the second 
factor variable, which explains the highest proportion of the residual varia- 
tion independently of the first, is the linear combination corresponding 
to the second canonical vector. There are as many linear combinations as 
there are non-zero roots \, which is equal to the rank of the matrix I’. The 
linear combinations of 2, , --- , 2, supplied by the canonical vectors of zero 
roots of X vanish identically, indicating the dependence of the factor variables 
associated with the measurements 2, , --- , 2). 

The factor loading of the variable x; on the first factor chosen above is 
the correlation between the two. The covariance is 


cov (2; , lz’) = lnk +--+ + Lei = Ml; , 
and if the variables x; are initially chosen to have unit variance the correla- 
tion is 

ee Vail 
Viv VW Vit: +8 


The factor loadings are then the elements of the first canonical vector suitably 
standardized. Similarly the factor loadings of any other factor are derived 
from the canonical vector defining the factor. 

Even after exhausting all the independent factor variables, there still 
remains some variation left in x to be explained by the specific factors unless 
the number of independent common factors is equal to p. In the problem 
originally considered by Hotelling, the successive components explaining 
variation in x were not confined to the common factor portion z but were also 
functions of the specific factors s, which then are equivalent to linear functions 
of x. Hotelling’s principal components are, therefore, important in problems 
where the total variation of a measurement vector z is sought to be accounted 
for, to the maximum amount possible, by a smaller number of linear functions 
of x. The principal components of Hotelling are derived from the latent vectors 
of the matrix © = I + A instead of T alone as used above. It may be observed 
that when 


(3.1.5) 








A= sl, 


ie., when all the specific variables have the same variance 6°, a latent vector 
lof T + A satisfying the equation 


Ur + #1) = ul 
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also satisfies the equation 
TW =(— #)l=2X 


and is therefore a latent vector of I. The principal component analysis of 
Hotelling is thus a method of factor analysis with the factor loadings inflated 
keeping the same relative magnitudes, when all the specific variances are the 
same. 

There is some arbitrariness in the above characterization of the basic 
set of factors because instead of maximizing the sum of the variations ex- 


plained in z, , --- , x, we could maximize a weighted sum and arrive at a 
different basis and consequently a different set of factor loadings. When the 
variables x, , --- , x, are chosen to have unit variances the method adopted 


is equivalent to using reciprocals of total variances as weights. 

The quantity 6; , the residual variance of x; unexplained by the factor 
variables, satisfies the equation 
2m? 
mm’ 





AL; 2 
Zt ele | +- + --- + 6, (3.1.6) 
where J, m, --- are the latent vectors defining the factors Jz’, mz’, --- and the 
subscript 7 relates to the 7th element in the vectors. 
The best formula for predicting a factor variable such as Jz’ from the 
observed measurements g is obtained by the method of regression (16). If 
ky’ is the predicted value, then & satisfies the equation 


ko=1T, k=irz" =),l2", (3.1.7) 


and similarly for other factors. The characterization of the basis considered 
here together with methods of estimation is known as the principal factor 
analysis (PFA) (7). 


3.2 Second Characterization 


Instead of asking for a factor variable which explains as much of varia- 
tion as possible of z, we may pose the problem in a different way. What is 
that factor variable which is predictable from x with the maximum possible 
precision? Or in other words, what is that factor variable which is maximally 
related to z? The solution to this problem depends on a canonical correlation 
analysis of the hypothetical factor variables z with the measurable variables 
z of which z constitute a pat’. 

If lz’ and gz’ represent two linear combinations of factor variables and 
test scores, then according to the theory of canonical correlations (8) the 
correlation (or its square) between the two linear functions 


(iTq’)’ 


(Irl’)(gzq’) (3.2.1) 
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has to be maximized. Using the algebra developed in a similar genetic prob- 
lem (14), the optimum value of the correlation » is found to be a root of the 
equation 


|\r—»v>r|=0, (3.2.2) 
or 
|2-—AA|=0, A=1/(1 —>v”*). (3.2.3) 
The vectors | and q are proportional and satisfy the same equation 
US— A) =0, o(2—ArAA) = 0. (3.2.4) 


That factor variable which is highly correlated with z is lz’, where | is 
the latent vector corresponding to the largest root of the determinental 
equation (3.2.2). The second factor variable, uncorrelated with the first and 
possessing the highest correlation with x is mz’, where m is the latent vector 
corresponding to the second root of (3.2.2), and so on. We get as many 
factors as the number of non-zero values of v’ or values of \ greater than 
unity which is the same as the rank of I. 

For any factor lz’ as determined above 

A-1 


Ir = 12! = —— BY = ( - Dr 


cov (a; ’ Iz’) = Lyi: + Per + Leni _ (A = 1)1;8; . 


‘The correlation between x, and /z’, 


V(X = DIA, ’ 


is the factor loading of x; on the factor Jz’. This is again an element of | 
multiplied by a constant. It can be shown that the same factor loadings are 
obtained if instead of (x, , --+ , 2,) we consider (c,2, , --+ , ¢,%,) with the 
variables arbitrarily scaled. In the previous case it is necessary to reduce the 
variables (x, , +--+ , 2,) to unit standard deviation before proceeding to 
derive factors in order to achieve uniqueness of factor loadings. 

To predict the factor measurements we use the regression equation as 
in (3.1.7). In this case it turns out that lz’, mz’, --- , defined by the latent 
vectors of (3.2.3), can be best predicted by Iz’, mz’, --- , avoiding the 
complication of multiplication by =~* necessary in the case of factors defined 
in the earlier characterization of the basic set (3.1.7). 

The residual variance 6; in x; unexplained by the factor variables satisfies 
the equation 








(3.2.5) 


_ A. —)D pu A; — 1) 
eS TAY U6; + ‘mam m 


ete +8, (3.2.6) 
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similar to the formula (3.1.6) in the earlier case. This second characterization 
of a basis together with methods of estimation may be called canonical 
factor analysis (CFA) to bring out its connection with the theory of canonical 
correlations. 

Factor analysis thus fits in a general theory of canonical correlations 
involving two sets of variables: one set being observable and the other set, 
observable as in multiple regression; dummy as in multiple discrimination; 
or hypothetical as in problems of genetic selection. 


3.3 Which Is a Better Characterization? 


This question is meaningless if we are dealing with the true values of 
the dispersion elements satisfying the conditions of a given rank of the 
factor-variable space because one can be transformed into the other, and in 
fact they may be replaced by any other basic set and they all serve the same 
purpose. 

But this is no longer true when we have only estimates of the dispersion 
elements and factors are estimated by formally substituting for 2 the esti- 
mated quantities and choosing A to satisfy the equation (3.1.6) in the first 
case (PFA) and (3.2.6) in the second case (CFA). Which then is a better 
estimate of a basis? 

From the point of view of statistical estimation, PFA gives a least- 
squares estimate (16, p. 119) and CFA, a maximum-likelihood estimate, 
when normality of the distribution of the observations is assumed. At present 
there is not much to choose between the two methods except for the following 
reasons. The maximum-likelihood estimation leads in general to better 
results when the distribution of the variables is specified. No suitable test 
based on the least-squares estimates is available while there exists an easily 
computable test criterion on the basis of maximum-likelihood estimates. 
No further computations are needed to obtain the factor measurements if 
the factors are estimated by CFA; in fact, in this method, factors are deduced 
from a description of their measurement. 

There is another logical argument which may have to be borne in mind 
in deciding the issue. A rigorous hypothesis concerning the number of in- 
dependent factor variables is perhaps never true, and a test of this null hypo- 
thesis can detect its falsehood only when there is a serious departure. If then 
by following a rule of behavior (as determined by a test criterion) we decide 
to extract a certain number of factors, any method of estimation may be 
looked upon as providing only a summary of all the factors in terms of a few 
dominant ones having a definite existence with magnitudes bigger than 
standard errors calculable from the observations. It is then of interest to 
examine whether one method of estimation leads to a better summary than 
the other and at the same time has low errors of estimation. 

From this viewpoint, PFA may be thought of as providing the best k 
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(given number not necessarily exhaustive) factors explaining the maximum 
possible variance in the measurements while CFA, the best k factors which 
have in some sense highest possible correlations with the measurements. 
This may mean that while the first set attempts to explain as much as possible 
of the variations in the individual measurements, the latter set focuses on the 
correlations. Perhaps the psychological interest chiefly lies in the latter set, 
which offers a better explanation of the correlations between the measure- 
ments. 


4. Estimation and Tests of Significance for Factors 


4.1 Estimation of Factor Loadings 


Let A = (a,;) denote the observed dispersion matrix of the vector 
variable z. This is sufficient for the estimation of I and A, the two components 
of the population dispersion matrix 2. Following the equations (3.2.3, 3.2.4) 
of the second characterization, we have on substituting A for 2 


a= hae] oe, 
XA = 1) = 6, (4.1.1) 


where / is a latent vector corresponding to the latent root \. From the point 
of view of mechanical computations it is convenient to solve for 


b= 1A, bb’ = 1, (4.1.2) 
in which case 0 is the latent vector of 
| aA°*? AA” — rT | = 0. (4.1.3) 


Let us suppose, for the sake of illustration, that we are extracting two factors. 
If b = (b,, --- , b,) and ¢ = (¢,, --- , c,) are the first two latent vectors of 
(4.1.3) corresponding to the roots A, and A, , then the equation (3.2.6) gives 

hu 1 — DR + Oe — DE + HE, (4.1.4) 
or 


aii _ Gi r 
Q&—-De+Q,— Dati” gi’ 7 


where g, is defined by the last part of equation (4.1.5). The equation (4.1.3) 
can now be written in terms of the observed correlation matrix R instead of 
the dispersion matrix A 





|GRG — r»I | = 0, (4.1.6) 
where the elements g,; of the diagonal matrix G satisfy 


g. = VO — 1b3 + A. — Dei + 1. (4.1.7) 
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The computational problem is then to solve for g,’s satisfying the equations 
(4.1.6) and (4.1.7), where A, , A. , are the latent roots of (4.1.6) and }, c, the 
latent vectors. A tentative method is to start with a trial matrix G and obtain 
successive approximations by solving (4.1.6) for A, , A. , and 8, c, and sub- 
stituting in (4.1.7). The process is repeated until the g; converge. 

A better approximation to g; is obtained by using the formula 





9: = ne - 1)b + (* “ 1)e +1, (4.1.8) 


where 
[Lg ]o hy = Ae 
p—-—2 





1 = . (4.1.9) 
the summation [gi] refers to the gi at the previous stage used in equation 
(4.1.7) to obtain A, , A. 

The two formulas (4.1.7) and (4.1.8) should agree towards the final 
stages when convergence is expected to be slow. But in the initial stages 
(4.1.8) may accelerate convergence. 

The estimated factor loadings on the first and second factors at any 


stage of approximation are 
Vi —-10G", Vr —1cG". 


The same method holds good for any number of factors. The estimates of 
factor loadings obtained from equations (4.1.6, 4.1.7) can be shown to satisfy 
the maximum-likelihood equations of Lawley (11, 12) and thus constitute 
one out of a number of possible solutions. The equations (4.1.6, 4.1.7) are 
in a proper shape to admit an iterative procedure for solution. The use of 
equation (4.1.7) seems to avoid a difficulty which may occur in the iterative 
procedures. The iterative method given by Lawley (16, p. 130) may suffer 
a breakdown on the initial iteration due to an improperly chosen trial set 
of factor loadings leading to imaginary values of the quantities commonly 
designated by “‘h, , h2 , --- .”” This may also occur in PFA with the guessed 
communalities at the first stage. 


4.2 Tests of Significance and Estimation of Number of Factors 


It is also necessary to lay down some rules for determining the number 
of factors to be estimated. This is partially answered by any reasonable test 
for a specified number of factors. We determine that number of factors for 
which the chosen test does not show significance, while for any smaller 
number the hypothesis is contradicted. If the level of significance is based 
on the 5 per cent level, then thi¢ method leads us to a lower confidence limit 
to the number of factors. That is, we can assert, with a risk of only 5 per cent, 
that the number of factors is at least as large as that discovered by the above 
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procedure. This is no doubt an objective rule for determining the lower limit 
to the number of factors, but in practice it may be better to extract one or 
two more factors, depending on the magnitude of the residual roots. If one 
or two such roots are sufficiently bigger than unity (though not significantly 
so) it may be worth while to extract the factor corresponding to them also. 

The hypothesis we propose to test is that the population dispersion matrix 
admits the decomposition 


ZT=T+A, (4.2.1) 


where A is a diagonal matrix with positive terms and IL is a positive semi- 
definite matrix of rank k < p. 

The test criterion we use is derived by the principle of likelihood ratio, 
assuming that the observations are normally distributed. 

The exact distribution of the test criterion is not known but in large 
samples (— 2 log) of the likelihood ratio is distributed as x” with degrees of 
freedom equal to the number of independent restrictions on the elements of 
> imposed by the hypothesis (4.2.1). This hypothesis specifies the rank of 
the matrix 2 — A for suitably chosen A. If its rank is k, then by fixing the 
first k rows and columns the rest of the elements can be computed, which 
implies (p — k) (p — k + 1)/2 restrictions. Allowing for p unknown values 
in A, the number of restrictions is equal to 








—k(p—k+1 —k*?-—p-—k 
(p e, + Bc gee - p—k (4.2.2) 
The test based on the likelihood-ratio criterion is 
—(n — 1) log cI , (4.2.3) 


where © is the estimated dispersion matrix using the maximum-likelihood 
equations of section 4.1. The multiplying coefficient (n — 1), where n is the 
sample size, may be replaced by the more appropriate value for the x” approxi- 
mation to hold when n is not large, 


_ ~ B45 _ 2) 
(n ! 6 3) 


where p= is the number of variables and k is the number of factors (1). Since 
the roots of the equation | © — AA | = 0 corresponding to k factors are 
estimated by | A — \A| = 0 or the equivalent forms (4.1.3), (4.1.6), it follows 
that the roots of | 5 — A | = 0 are 


Mey es Bo fF (4.2.4) 
while the roots of | A — A | are, in descending order of magnitude, 


As 9 *** 9 Mey Aden» *** gps (4.2.5) 
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and, therefore, 


2) = isi “°° AD, (4.2.6) 


| 2 | 
which is the product of the least (p — k) roots, at the last stage of iteration of 
the equation (4.1.6), | GRG — »I | = 0. 
The x’ test is 


—(n — 1) log (ys: «++ A,) (4.2.7) 


with [(p — k)? — p — k]/2 degrees of freedom apart from the slight refine- 
ment in the multiplying coefficient. 


4.3 A Modified Criterion and Its Practical Use 


It may be recalled that the likelihood-ratio criterion is the ratio of the 
maximum likelihood under the restrictions of the hypothesis (4.2.1) to that 
without any restrictions on 2. It is of interest to examine how the ratio (or 
its logarithm) of the likelihoods is converging to the maximum value (or 
the negative of log ratio to its minimum value) during the iterative process. 
Fortunately this can be expressed in terms of the roots \,,, , --+ , A, at any 
stage of the iterative process 


—(n — I)flog Assi +++ A,) — (p — &) log. ], (4.3.1) 


where X, of (4.1.9) is the arithmetic mean of d,,, , --- , A, . [Strangely the 
sequence (4.3.1) of statistics (likelihood ratios), which converge ultimately 
to the test criterion (maximum-liklihood ratio), resembles Bartlett’s (1) 
ratio test but, of course, the roots \; are obtained differently and the ratios 
are used with different degrees of freedom. From this analysis it would appear 
that Bartlett’s ratio is an initial approximation to the actual test criterion.] 
(4.3.1) converges to 


—(n — 1) log (isi +++ Ay) 


at the final stage when 
(p — k) = Mar tees + As (4.3.2) 


Suppose that (4.3.1) is not significant as x” with [(p — k)? — p — k]/2 degrees 
of freedom, at any stage, then the same conclusion is reached even after 
completing the iterative process. If a test of significance is the only aim of 
analysis, then, sometimes, iteration can be stopped at some stage. Even if 
the result is significant, it is possible to terminate the computations provided 
the change in (4.3.1) at that stage is small from one cycle of operations to 
another. 

The modified criterion is extremely useful in practice when the object 
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of the analysis is to estimate the number of factors (lower confidence value) 
as well as the factor loadings. 

Before proceeding with the cycle of operations for estimation, let us fix 
some high value of k as the number of factors and calculate the roots after 
one or two iterations. At this stage, find that value of r for which A, (with 
df. [(p — r)? — p — r]/2) is not significant, but A,_, is. This shows that the 
number of factors is not greater than r. We may set the number of factors 
provisionally at r and continue the process of estimation. Each time we may 
calculate A,_, and A, to see whether A,_, becomes not significant at any 
stage. If it is not significant, there is a case for switching over to (r — 1) 
factors instead of r. 


5. Summary 


The experimental situation and the nature of the data on which the 
technique of factor analysis can be successfully employed may be stated as 
follows. Each of the » measurements on an individual has a linear regression 
on a common set of a few hypothetical variables or factors. The deviations 
from regression for any two measurements are uncorrelated. The factor 
analysis seeks the smallest number of independent hypothetical variables 
necessary to explain the intercorrelations between the measurements. 

If R is the observed correlation matrix, the computational problem of 
factor analysis depends on the solution of the diagonal matrix G satisfying 
the equations 


| GRG — r»I | = 0, (5.1) 

g: = [Ar — Dar + --- +A. — Day, + 1)”, (5.2) 

where k is the number of factors assumed, \, , --- , A, , are the first k largest 
roots of (5.1) and g; = (a;; , --- , @;p) is the Jatent vector corresponding to 


the root A; . Once G is found to satisfy the equations (5.1, 5.2), then the 
factor loadings are given by 


(A; = la,G" (j oat l, Sei »k), 


and the test of the hypothesis that k factors are adequate to explain the 
intercorrelations is 


x” = —(n — 1) log, (isi «++ Ay) 


with [(p — k)? — p — k]/2 degrees of freedom. The lower confidence limit 
to the number of factors is the smallest value of k for which x’ is not sig- 
nificant. 

Some research remains to be done to find an elegant computational 
technique for solving the equations (5.1, 5.2). The method available at 
present is to guess suitable values of g; , substitute in (5.1) and obtain better 
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approximations to g; by using (5.2). This process is continued until convergence 
is secured. Unfortunately this appears to be a slow process unless the initial 
values of g; are very near the true values. Even with a good set of trial values 
the problem can be best tackled only on an electronic computer when large 
numbers of variables are involved. A suitable program for Illiac is being 
written by Mr. Golub of the Digital Computer Laboratory at the University 
of Illinois. A numerical example solved on a tentative program is reported 
below. Full details will be presented soon. 

First it may be noted that the relation between g; and the communality 
h? for the ith variate is 


9g =1/V1 —k, 


so that good trial values of g; are available once the communalities are ap- 
proximately determined by an initial factorization of the correlation matrix 
by a simpler method, such as the centroid. Another method suggested in the 
literature is to choose the squared multiple correlation as an estimate of the 
communality. In many cases it is sufficient to start with the initial approxi- 
mation g; = 1/2. 

Second, although the test involves the product of the roots at the final 
stages of convergence, it is useful to compute at intermediate stages the 
statistic 


x” = —(n — I)flog, (rs --- A) — (p — &) log, (usa + °°: +A,)], 


which, when not significant, implies the nonsignificance of the ultimate x’. 
We could stop at any stage after this, provided further iterations do not 
considerably alter the factor loadings. 

The following correlation matrix was presented by Davis (6) in an 
attempt to study factors of comprehension in reading. 


1.00 
12 1.00 
Al 34 1.00 
.28 36 16  ~—‘1.00 
52 3 34 30 ~—-:1.00 
71 71 43 36 .64 1.00 
.68 .68 42 35 05 76 ~=—-1.00 
51 52 .28 .29 45 OT 59 ~=1.00 
.68 .68 Al 36 R515) .76 .68 8 1.00 


Assuming a single factor, the x” was calculated and found to be significant. 
This indicated more than one factor. Under the hypothesis of two factors 
the value of x’ 


—(n — 1)[log (Az --+ Xo) — (9 — 2) log (Ms +--+ + %)] 
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came down to 29.73 at an early stage of iteration. This being less than 30.1, 
the 5 per cent significance value of x’ with 19 degrees of freedom, the hypoth- 
esis of two factors stands unrejected. So the data admit an interpretation 
in terms of two significant factors only. A fairly stablilized set of factor 
loadings are 


Factor 1 845 817 .477 401 .669 .891 .834 .651 .833 
Factor 2 —.309 —.084 .012 .153 .161 .145 .081 .122 .080 


I wish to thank Dr. C. F. Wrigley, who read the manuscript and offered 
some helpful comments. 
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RELIABILITY FORMULAS FOR NONCOMPLETED OR 
SPEEDED TESTS* 


Louis GUTTMAN 
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New formulas are developed to give lower bounds to the reliability 
of a test, whether or not all respondents attempt all items. The formulas 
apply in particular, then, to completed tests, pure speed tests, pure power 
tests, and any mixture of speed and power. For the case of completed tests, 
the formulas give the same answer as certain standard ones; for noncompleted 
tests the formulas give a correct answer where previous standard formulas are 
inappropriate. The formulas hold both in the sense of retest reliability and 
of parallel tests. 


I. Introduction 


Recently, there has been increasing awareness of an important inade- 
quacy of all standard formulas that are in current use for studying reliability 
of tests. These formulas are not appropriate for tests in which all items are 
not attempted by everybody. In particular, they do not hold for speeded 
tests (cf. 1). 

The present paper proposes a new analysis of the problem, and provides 
some practical formulas that hold whether or not the tests are completed. 
The case of completed tests emerges as a specialization of the present analysis. 
Thus, formulas developed here hold for pure speed tests, pure power tests, 
and for tests which are partly speed and partly power. 

An important example of one of the practical formulas developed here 
is as follows. Consider a test composed of m dichotomous items. Each item 
is scored unity if answered correctly, zero if answered incorrectly or not 
attempted. Each person’s total score is the sum of his scores on the m items. 
Suppose the test is administered once to a large population of individuals 
and that there are m — n items each of which has zero variance in its scores; 
that is, on each of these m — n items either all people scored 0 or all scored 1. 
In particular, all items not attempted by anybody are in this subset of 
m — n. The n items with positive variance will have their statistics on the 
single trial denoted as follows: 


x; = proportion of the population that answered the jth item correctly 
(j a 1, 2, sa ,n); 


p; = proportion of the population that attempted the jth item 
(7 = 1, 2, «++ , n), regardless of whether the answer was correct 
or not. 


*This research was facilitated by an uncommitted grant-in-aid to the writer from 
the Behavioral Sciences Division of the Ford Foundation. 
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Let s; denote the variance of total scores on the trial. It is immaterial whether 
s; is computed from all m items or only from the n of positive variance, 
since adding items with zero variance will not change s; . Let D’ be defined by 


Di = Vi(n- V2 =p), (1) 
and let Lj; be defined as 


~ (2 rt dza-2 
i-— ae : (2) 


n— 1 8; 


L; _ 








Then, if p; denotes the reliability coefficient for the total test scores, we 
prove below that Li is a lower bound to p; , i.e., 


3 
L3 S p. 


IA 


a (3) 


Another lower bound to p; derived below can be designated by Lj’ . 
To compute it, first compute D’’ by 


D” =2 > (vi-», > Vin), (4) 
i=l g=j+1 
and then use D” in place of D’ in (2). 

Sometimes L;’ may be better than Lj , depending on whether D’” < D’ 
or not. A third lower bound, and one that is always better than either Lj 
or L3’, is given by inequality (52) below. Directions which can lead to even 
better lower bounds are indicated at the end of this paper. Since D’ and D” 
can be relatively substantial numbers compared to s; in noncompleted— 
especially speeded—tests, they can yield very low bounds Lj and L3;’—even 
negative (or useless) ones. In part, this can be due to the greater room for 
unreliability for noncompleted tests as compared with completed tests, 
and in part to a certain loss of information that occurs in the derivation of 
our present formulas. This loss can sometimes be made up by devising more 
specialized formulas for special cases. Our present formulas are for a very 
wide class of cases, and hence cannot be most efficient for every subclass 
separately. 

In (3), p. can be interpreted from the point of view either of retest 
reliability or of parallel tests; the same lower bound (2) ensues in each case 
(cf. 3). The same freedom of interpretation holds for all the formulas in 
the present paper, since we restrict ourselves to but a single trial for the 
actual numerical computations. 

To establish that Lj is a lower bound to p; , we begin by using the results 
of a previous paper (3), wherein some important fundamental tautologies 
and inequalities for p; are developed that make no assumptions whatsoever. 
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For practical use, a certain quantity denoted there by 6 must be observable, 
or at least bounded from above. The contribution of the present paper is 
essentially to establish upper bounds to 6 that are observable from a single 
trial, given a certain assumption discussed below. The quantities D’ and 
D” defined in (1) and (4) are such bounds to 6 for the type of test described 
above, and Lj and Lj’ are the resulting modifications (as computed from a 
single trial) of the lower bound to p; denoted by \% in (3). 

Notice that if everybody attempts all items in the test just discussed, 
so that p; = 1 (j = 1, 2, --- , n), then the radicals in the right of (1) and (4) 
vanish for all 7, making D’ = D” = 0. In such a case, according to (2), 
both Li and Lj’ become the same as the lower bound L, discussed in (2), 
or the usual lower bound for the case of completed tests wherein all items 
are experimentally independent. 


™ D> ti(1 — 2) 
, ee eee (D’ = D” = 0). (5) 





[As has been pointed out in (2), L; is algebraically the same as formulas 
deduced in other contexts by Kuder and Richardson and by Hoyt, but 
its derivation, interpretation, and use in (2) are quite different from those 
of the others by virtue of the differing contexts. The context in which L, 
was originally derived is a special case of the context of Lj and Lj’ or of the 
present paper. It is not clear at present whether the Kuder-Richardson or 
the Hoyt formulations can be readily extended to the problem of noncom- 
pleted or speeded tests.] Lj and L3’ are more general than ZL; in that they 
allow for possible experimental dependence among the items due to non- 
completion of the test. 

Other lower bounds to p; which allow for possible experimental depend- 
ence among items are also developed here, as well as bounds for tests in 
which the items are not dichotomies, or are not scored with 0-1 weights. 

As one of the referees of this paper has pointed out, it may be desirable 
also to estimate the total error variance itself and not just p: . The variance 
in question is that denoted by e’ in 3, p. 229. Error variances will in general 
vary from individual to individual, especially in speeded tests, and é’ is 
their mean over all individuals. The lower bounds to p; are easily converted 
into wpper bounds for ¢’ by virtue of the relationship: ¢ = o7(1 — p:), where 
o; is estimated by si . Thus ¢’ S$ si(1 — L), where L is any lower bound to p? . 


II. Notation 


For the proofs, the notation of the previous paper (3) will be followed. 
A slight modification here is that m denotes the number of items, or part- 
scores, in the test, while n denotes the number of part-scores with observed 
variance greater than zero. Only these n actually variable part-scores affect 
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the reliability of the total scores, and we get more efficient lower bounds by 
using 7 in place of the larger number m. 

We consider here only the case where n = 2, since we wish largely to 
restrict ourselves to information about reliability that is obtainable from an 
internal analysis of but one trial of the test. This requires that the test have 
at least two subscores (m = 2), and in particular that at least two subscores 
each have a variance greater than zero (n = 2). In what follows, we assume 
that items of the test with positive observed variance are the only ones being 
considered. 

Let x;;, be the score of person 7 on the jth item of the test (with positive 
variance) on trial k (7 = 1, 2, --- , m), and let ¢;, be the sum of the n part- 
scores 

ti. = > Lise (6) 

The scoring scheme for any item can be arbitrary, except for the scores 
to be given to nonattempted items. We shall assume that a nonattempted 
item is given a score no higher than the lowest possible score for that item when 
attempted. More specifically, we shall assume that no negative scores are 
given to any item, and that nonattempted items are all scored zero. Should 
a scoring scheme originally allow for negative scores, it can easily be con- 
verted into the non-negative form we require by addition of a suitable constant 
to each part. This will not change the reliability coefficient p; in any way, 
nor any of the variances required in our formulas. A non-negative scoring 
scheme will yield total scores that correlate perfectly with those from the 
original scheme from which it is derived by this adding of constants. 

It should be clear that we are excluding from our present analysis the 
case where an incorrect answer to an item is scored lower than omitting that 
item. 

Let m; be the maximum score obtainable on the jth item. Our assumption 
is then that the scoring scheme is in such a form that for all 7, 7, and k 


0S 2%. 3 m;, (7) 


and in particular that x,;;, = 0 if person 7 omits item 7 on trial k. 

The population of persons and the universe of trials will both be assumed 
to be indefinitely large in order to avoid discussion here of sampling problems. 
Ultimately, only one trial from the universe need be made in practice to 
provide empirical data for use in our lower bounds. 

The expected values over the trials of the x;;, are denoted by X;; , 


Xij = E Liik . (8) 
k 
The error of unreliability—or the experimental error—for person 7 on item j 


in trial k is x;;, — X,;; . The covariance over trials between two items j and 
g is defined separately for each individual 7 and is denoted by 7z,;2,, , 
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Vaiieis — E LigeLigk — Xij;X ig . (9) 
k 


It has been shown how the reliability coefficient p; of the total test score, 
as well as lower bounds to p; , depend directly on the quantity 5 defined 
from the covariances in (9) by 
6 = >. E Vriizio 2 (10) 
oxi i 
Another way of writing the right member of (10), which is more convenient 
for our purposes, is 


oe 2 ae a or (11) 
7=1 g=jt+l1 7¢ 

The right member of (11) equals the right member of (10) by virtue of the 
fact that, from (9), Yz;,2:; = Yeijz:. » OF these covariances are symmetric 

in g and j. 
If part scores g and 7 are experimentally independent (that is, statistically 
independent over trials) for the 7th person, then y,,,.,, = 0. If these two 
part scores are independent for all persons in the population, then Ey,,,.,; = 0. 


The converse need not hold, of course. Since nonzero covariances can be 
either positive or negative, we can have E 7,,,.,, = 0 by having positive 
7 


covariances for some people and negative ones for others. 
Similarly, if all items are statistically independent for all people, then 
5 must vanish. However, we can have 6 = 0 even though not all items are 
statistically independent for all people, for again positive and negative 
covariances within and/or between pairs of items can cancel each other. 
The role 6 plays in lower bounds to p; is illustrated by the third universal 
lower bound, A% , developed in (3), 


= 2 
2 (te 


* — 
NS n—-1 





(12) 


oO; 
The variances in the right member of (12) are defined as follows. The notation 
for the expected (“true”) individual total scores on the test is 


T, =Ety. (13) 
k 
The respective over-all means for part and total scores are 
& =EX,;, r=ET,. (14) 
Then, 
o:, — E E (ssn a ty". a; = E E (ts, _ 1)’. (15) 
isk ¢ & 
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Each of the four types of parameters defined in (14) and (15) is observ- 
able on a single trial. For any fixed value of k, consider only expectations 
over 2: 


E Liik » E lik » E (Liik — E Disa)’ E (ti, — E ty)’. (16) 


It has been shown (2) how the variance of each of these quantities is zero 
over trials, the basic assumption being that each respondent answers inde- 
pendently of the other respondents (that there is no cribbing one from the 
other, etc.). Thus, the probability is unity that in any given trial k, the 
four quantities in (16) are respectively equal to 


g; ’ Ty o:; ’ o; ° (17) 


If the population of respondents is not infinite, or if only a finite sample is 
drawn from an infinite population, then the variances of the quantities in 
(16) will be positive, and the quantities will be only estimates of the respective 
quantities in (17). As in the previous papers, we shall not be concerned here 
with sampling error and assume for convenience that the operators E and 
BE are always over an infinite population or universe. 


III. Some Basic Identities 


We need some further notation to study what happens when items are 
not attempted. Let 


_ Jl if person 7 attempts item 7 on trial k 
na 0 otherwise. (18) 


Furthermore, let 


P;; =EDpin, mr, =EP;;. (19) 
k i 

Thus, P;; is the proportion of the trials in which person 7 attempts the item 7 

and z; is the mean of such proportions over all individuals. Just like the 

quantities in (17), 7; is observable from a single trial, namely by computing 

E pi. , or the proportion attempting the item in the given trial. The proof 


of this observability is of exactly the same nature for the parameters in (17) 
(cf. 2). 

The universe of trials is thus divided into two sub-universes for each 
person and for each item: those trials in which he attempts the item (so 
that p,;, = 1) and those in which he does not attempt the item (so that 
Pik = Xijx = 0). 

Given notation (18), the following simple and important identity holds: 


Diieisn = Vijx o (20) 
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For if p;;, = 0, the x,;, is zero by our convention that unattempted items 
are scored zero. And if p,;;, = 1, then (20) holds by direct multiplication. 

Further notation needed refers to certain expected values of the 2;;, 
or 2;,, . Let X‘) be the expected value of x,,, over that subset of trials in 
which person 7 attempts item 7 (or for which p;;, = 1), 


ei = E pijrtion/P ii . (21) 


Stating (21) for the case where g = j, using (20), and remembering (8) yield 

xi? a Xj;/P i; : (22) 

Finally, we need a basic identity relating two covariances. Let y:’?.,, 

denote the covariance between errors of unreliability, analogous to (9), 

for person 7 on items j and g, but only over the sub-universe of trials in 
which person 7 attempts item j (or where p,;, = 1), 


E DiinX ijxLigk 
i k i j 
1g ye EY. (23) 
a7 


Using (20) and (22) in (23) and then multiplying through by P;; show that 
Pi ec =E LijrVigk — X,,Xi?. (24) 
k 


Then, from (9) and (24), 
Veiizig = Pit + Xi( Xi? aes X is). (25) 


Identity (25) is our basic tool for examining the dependence among experi- 
mental errors due to noncompletion of tests. It breaks the over-all covariance 
between errors, 7,,;2;, , into two component parts, as expressed by the right 
member. 


IV. A Basic Assumption and Its Consequences 


Up until now, we have derived only identities or tautologies which are 
universally true, given a non-negative scoring scheme. The basic assumption 
from now on is that, if person 7 attempts item j, then his score on any later 
item g (g > 7) will be experimentally independent of his score on this attempted 
item j. That is, we are considering here the case where dependence is due 
solely to omissions, so that if a part is attempted, no further experimental 
dependence holds. This may be true, for example, of pure speed tests as 
well as many other tests, including some pure power tests where omissions 
may be scattered and not consecutive. The experimental dependence between 
items discussed in (1) is in particular true of pure speed tests: if a person 
does not reach a certain item, he certainly will not reach later items on the 
same trial; thus the dependence is due to nonattempts or omissions. We 
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shall assume that attempted items do not lead to experimental dependence 
but only to dependence among true or expected scores. In particular, we 
assume error covariances of the following type to vanish: 


Yin = 9 (9 >). (26) 
If hypothesis (26) is true, then (25) reduces to 
Vriizieg - b ge & aad = Xs,) (g > j). (27) 


According to (27), the total dependence over trials between item scores 
is a function of the three expected values in the right member. Should X;? 
equal X,;, , or the fact that the item 7 is attempted does not change the 
expected value on the (later) item g, then we would have y,,,.,, = 0 according 
to (27), or we would be back to what is assumed (implicitly or explicitly) 
in all previous standard reliability formulas. But if X$/? * X,, , then experi- 
mental dependence must hold between items j and g for person 7. 


We now wish to establish a useful upper bound to E y,,,2,, . From (21), 
since p;;, S 1 and all quantities involved are non-negative, we see that 
x3? = X;,/P i; ° (28) 


Using (28) in (27), and remembering (22), 
Yesste & XP Xl — Ps) (g > j)- (29) 
From (7) and (21), 
Xi, Sm, . (30) 
Setting g = j in (30) and using the result in (29) yield 
Yeijzig = MX (1 —Pi;) (9 > J). (31) 


Taking expected values of both members of (31) over 7 yields 
E Y24 i210 s mM; E X,,(1 =: P;;) (g > j)- (32) 


Now, from Schwarz’ inequality, 


E X,,(1 — P.;) S$ VE Xj, E(1 — P;;)’. (33) 


s 





Also, from (7) (written with g in place of j), 
Xie Ss m,X ig ’ (34) 
and from the fact that 0 Ss P;; < 1, 


(1 = P,,. < 1 — P;, . (35) 
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Using (35) and (34) in (33), and then the result in (32) [remembering (14)]} 
produces the desired inequality 


Eveciee S mV mé(1— 7) (9 > 3). (36) 


In (36), both x; and &, , are observable from a single trial. 
The upper bound to 6 that we are seeking is obtained from (36) by 


summing both members over g and 7 (except for g = j) according to (11), 


s<2d: (wits, - Vind). (37) 


i=1 emiti 





In a test composed only of dichotomous items, each of which is scored 
zero or unity, we have m,; = 1. For such a test, (37) reduces to 


$520 (vi-«, - vi.) (m; € 1). (38) 


a=1 g=it+l 
Inequality (38) holds for m; < 1 as well as for m; = 1, and we have so stated 
it; this is the formula given in 3, p. 64. What we have defined above as 
D" in (4) is the right member of (38) as computed from a single trial. 
Reviewing the proof shows that (37) holds for a much less restrictive 
hypothesis than (26). We can assume the inequality 


6.8? &>D (39) 


in place of only the equality of (26), and again arrive at (37). Under what 
circumstances an actually negative error covariance can be justifiably hy- 
pothesized remains a problem to be explored. 


V. Another Upper Bound for 6 


Using assumption (26), or (39), it is possible to arrive at other useful 
inequalities for E y,,,.,, and for 6 in place of (36) and (37). For exemple, 


the following inequality will be established: 


E Ye; ;206 &S 3m, V m€;(1 ae 7;) (g > j)- (40) 
For the proof of (40), use (30) in (27) to obtain 
VYzsizice = X;;(m, “ X io) (g » j)- (41) 


Since the right members of (41) and (31) are always non-negative, their 
geometric mean is never smaller than the smaller of the two, so we can 
write from (41) and (31) that 

Vriizio s Vm;X;;(1 aS P,;)X;,(m, se? X ia) (g > j)- (42) 


Notice that the left member of (42) may be negative, or that (42) does 
not refer to the absolute value of y.,,2,, . Now, the quantity X,,(m, — X,,), 
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when regarded as a function of X;, , reaches a maximum when X;, = m,/2, 
so we always have 


X,,(m, — X;,) S m;/4. (43) 


Using (43) in (42) yields 
Verses S ¥m,V mXi(1 — Ps) (9 > 5). (44) 
From Schwarz’ inequality, and then notation (14) and (19), 


E VXi(1 — P, P.,) V &;(1 <s 7;). (45) 


Taking expectations over 7 of both members of (44) and then using (45) 
yield the desired inequality (40). 

Summing both members of (40) over g and 7, remembering (11), yields 
another upper bound to 4, 


isd (vmea—=) > m,). (46) 


g=i+1 





For the special case of a test composed of dichotomous items scored 0 or 1, 
or where m; = 1, we have 


a m=n—-j (m,=1), (47) 


so that for this special case (46) reduces to 


5 < 3 n—-jVel—=)  (m, =1). (48) 


What we have defined as D’ in (1) above is the right member of (48) as 
computed from a single trial. 


VI. A Third and Better Upper Bound for 5; Further Possibilities 


It is helpful in discussing the bounds to consider first the special case 
of scoring where m; = 1. For this case, (36) becomes 


Eve. S VE(L— 7)  (m; =1,9>5), (49) 


lA 


while (40) becomes 
Eye, S4VE(L — 7) — (m, = 1,9 > J). (50) 


Which of these provides a better bound to EB Yeij2;, . That is, which has 


the smaller right member? 
In one respect, inequality (50) is better than (49): it has the factor 1/2. 














LOUIS GUTTMAN 123 


Clearly, (49) will be better than (50) if and only if ¢, < £;/4 (g > 7). Therefore, 

if we define «;, by 

2 {Vil~r) ff 8 <E/4 mig >i, GD 
bVé(l1—m) if & 28/4 

then we have an improved bound for 6: 


652 3 ( - ci) (52) 


g=i+1 


€jg 


Inequality (52) is sharper than either (38) or (48). More generally, if m; ¥ 1, 
we can define ¢;, to be larger of the two right members of (36) and (40) 
and again write (52). 

It is interesting that the matrix of the average error covariances, that is, 
of the E y,,;2;, 1s bounded in both (40) and (36) by a simplex matrix as 


defined in (4). A simplex matrix is a symmetric matrix whose elements 
are products of the form a;b, (g > 7). In (36), we can write a; = m; V1 —7; 
and b, = Vm.,é, , while in (40) we can write a; = V mé(1 — 7;) and 
b, = 4m, . 

There are important special kinds of tests which necessarily have internal 
simplex features and not only a simplex type of upper bound matrix for 
error covariances. Three such are: (a) a pure speed test (where everything 
attempted is done correctly); (b) a test composed of a single question like 
“Write down all the words you can that begin with the letter ‘’ ”’; (c) a 
power test in which, if a person decides not to try the jth item, he will try 
no more items. In each such case, it follows from notation (18) that 


DiikPiok = Piok (g > j). (53) 
Condition (53) states that if person 7 does not attempt item 7, then he does 
not attempt any items beyond 7. Or if he cannot produce j words beginning 


with the letter ¢, he cannot produce g words where g > 7. 
Multiplying (53) through by 2;,,, and using (20) 


DiitLigk = Lior (g >). (54) 
Using (54) in (21) yields 

xs = X,,/P,; (9 > J), (55) 
showing that the inequality in (28) cannot be improved; its upper limit is 
actually attained in our present special case of (55). Certain further in- 
equalities above correspondingly become equalities. Furthermore, it becomes 
feasible to obtain better bounds than those based on \% by pivoting instead 


on \% of (3). 
To return to the general case where (53) does not hold, further formulas 
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are also possible of the split-half type, based on A% of (3). The new 6 to be 
bounded consists of the covariance between two sums of errors, one sum from 
each half of the split. This covariance is always a sum of item error covariances, 
and can be bounded immediately by using our formulas for e;, above. The 
e;, Should be summed the way the split calls for, and the result can be used 
to bound 6 in the formula for \¥ . Using split-halves, while it requires only 
two sub-variances to be computed, does not avoid the problem of taking 
into account the possible experimental dependence between the halves, 
and this can be studied rigorously only itemwise, as through e¢;, . 





Erratum in Guttman, Louis. Reliability formulas that do not assume 
experimental independence. Psychometrika, 1953, 18, 225-239. 

On page 231 in formula (21) and in each of the preceding two lines, 
X;, should replace x; throughout. 
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A MATHEMATICAL MODEL FOR CONDITIONING* 


G. W. BoausLAvsky 
CORNELL UNIVERSITY 


It is postulated that occurrence of a conditioned response depends on 
recurrence of one of a finite number of specific vigilance reactions. Number 
of trial on which a conditioned response occurs is shown to be a sufficient 
statistic for estimating the number of such vigilance reactions. The hypothesis 
is tested by noting whether numbers of trials on which conditioned responses 
occur fall within confidence intervals determined on the basis of a selected 
sufficient statistic. Applications of the model to psychological research are 
suggested. 


I. Introduction and Postulates 


Systematic treatment of behavior has generally followed the pattern of 
functional relation between stimuli and responses, with intervening processes 
inferred from these two variables and viewed as theoretical constructs. 
There is reason to believe, however, that in some instances such processes 
may be treated as independent events. The specific reference is to behavior 
patterns which Pavlov grouped under the term “orienting reflex” (13, p. 134). 
Though Pavlov insisted that these disappear with the progress of conditioning 
(14, p. 94), Guthrie has presented a convincing argument to the contrary 
(5, p. 74), and one of Pavlov’s own statements (12, p. 385) may be interpreted 
as refuting the original thesis. 

A series of observations at the Cornell Behavior Farm has led the 
author to conclude that occurrence of orientation to conditioned stimuli 
is the rule rather than the exception. The more conspicuous features of this 
phenomenon are: circumscribed variability of pattern, synergy of action, 
facilitating effect of the general static reaction on the ensuing activity, 
and autonomic concomitants manifested by changes in respiration and 
cardiac output. All of these observations have either direct or inferential 
support in scientific literature (4, p. 3; 16, pp. 129, 305, 342; 2, p. 505; 19, p. 
13; 7, p. 668; 10, p. 139). 

It is apparent that inclusion of orienting behavior in a theoretical 
model would improve accuracy of prediction. However, since precise desig- 
nation of the reactions and of the conditions governing their emergence is 
impractical, the use of monotonic functions to express relations between 


*From a doctoral dissertation at Cornell University. The author wishes to acknowledge 
the invaluable advice and help of Professor H.S. Liddell, under whose direction this research 
was conducted. A special debt of gratitude is due to Dr. Jack Kiefer of the Cornell depart- 
ment of mathematics, whose skill and interest aided materially in the development of the 
mathematical portions of this paper. 
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variables must be abandoned. Accordingly, the present model has been 
designed on the basis of a system of operations which do not depend on the 
full knowledge of antecedent conditions. 

In view of the limited connotation of the term “orienting,” the author 
will follow Liddell’s precedent (9, p. 160) in referring to the animal’s immediate 
responses to the conditioned stimulus as specific vigilance reactions, or, in 
abbreviated form, as SVR’s. 

The following postulates state formally the author’s theoretical position: 

1. In a given situation a supraliminal sensory stimulus evokes in an 
organism one of a bounded set of N discrete and mutually exclusive specific 
vigilance reactions. 

2. Sensory stimulation immediately consequent upon the performance of 
each specific vigilance reaction becomes a conditioned stimulus for any response 
with which it is contiguous. 

The first proposition implies that, with the occurrence of each stimulus, 
the N members of the set of SVR’s compete one against another, with the 
ultimate outcome determined by unspecified factors extraneous to the 
stimulus. As long as the respective probabilities of the several outcomes 
are unknown, the author must assume, in the present development, that the 
outcomes are equally likely. The proposition does not, however, preclude 
formulations based on empirically evaluated probabilities of identifiable 
intervening variables, though the problem of development along these lines 
would be considerably more difficult. 

The second proposition implies that appearance of a conditioned response 
is contingent on the recurrence of any member of the set. In the present 
development the first recurrence is assumed to be the necessary and sufficient 
condition. The assumption of the first recurrence as the necessary and 
sufficient condition is based on arguments appearing in psychological litera- 
ture (5). A development of the theoretical position, stated in the two postu- 
lates, on the assumption of the necessity and sufficiency of nth recurrence 
is no less feasible, though much more cumbersome. 

The implications of the two postulates will now be examined with the 
aid of the classical occupancy problem serving as the model. 


II. Probability Distribution of n, 


Stimuli are presented one at a time, independently of each other, the 
probability of a given stimulus evoking each of the SVR’s being 1/N. 

An instance of recurrence is defined as evocation of an SVR which had 
been evoked on one or more preceding trials. The statement “‘k instances of 
recurrence” refers to the number of trials characterized by such recurrences. 
It does not imply that the same SVR has occurred k times; the number of 
SVR’s involved in k instances of recurrence may have any integral value 
from 1 to k inclusive. 
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The variable n, is defined to be the number of the trial on which the 
kth instance of recurrence takes place. By deduction from the postulates it 
is also the number of the trial on which the conditioned response appears 
for the kth time. 

As an illustration of the preceding definitions, consider the four suits 
of a deck of cards as representing four different SVR’s. In drawing with 
replacements the sequence H, S, H, D, S, S, C, one obtains three instances 
of recurrence: that of H on the third drawing, and that of S on the fifth 
and sixth drawings. Thus, n, = 3, n. = 5, and n; = 6. Since all suits appeared 
during the sequence, all subsequent drawings will be instances of recurrence. 

Because n, trials include k instances of recurrence, the total number 
of different SVR’s evoked is n, — k. Also, since the n,th trial is one during 
which an instance of recurrence takes place, the total number of different 
SVR’s evoked immediately prior to that trial is also n, — k. Thus, at the 
end of the (n, — 1)th trial as well as at the end of the n,th trial, the animal has 
in its repertory N — n, + k different unevoked SVR’s. Accordingly, the prob- 
ability distribution of n, may be written as 


Py{n, = j} = (probability that after 7 — 1 trials the animal’s repertory 
contains N — j + k different unevoked SVR’s) X 
(probability that on jth trial one of the previously 
evoked j — k SVR’s is elicited again). 


Evaluations of these two probabilities have been derived by Feller 
(3, pp. 69, 313). Allowing for differences in symbols, the substitution yields 


Prim =i) =|(x Re. v(7>") 
pe 


- N-1 = (i> (a —-k- “y 

oe ) zn N ” 
For the cumulative distribution of n, another readily verifiable function 

derived by Feller (3, p. 77) is directly applicable. 


Py{n, Sj} = probability that by the end of jth trial there have been 
k or more instances of recurrence of SVR’s 
= probability that by the end of jth trial there are at 
least N — j + k different SVR’s not yet evoked 


= by ja) BOE) 


(i= -+)/ N-itk_), 
N N-jt+kty 





(2) 
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III. Sufficiency of the Statistic n, 


For a discrete case, t, = t(x, , 22, °-* , %,) is said to be a sufficient statistic 
for the parameter N if, whenever py(t,) > 0, the conditional probability 
function py(xz, , --- , Z, | t,) does not depend on N. 

A necessary and sufficient condition that ¢, be sufficient is that the 
joint probability function of z,’s can be written 


Pw(Xi es he) _ g(t, , N) A(z, by a), (3) 


where g = 0, h = 0, g depends on z,’s only through the function ¢, , and h 
depends on z;’s in any way, but does not depend on N. 

It will now be shown that the joint probability distribution of the 
variables n; , 1 S 7 S k, can be written in the form (3), where n, takes the 
place of ¢, , and that, therefore, n, is a sufficient statistic for the parameter NV. 

Let X; + 1 be the number of trials between (7 — 1)th and 7th instances 
of recurrence of SVR’s (not counting the former, but counting the latter). 
For = 1, 

Py{X, = x,} = probability that the first zx, stimuli evoke x, different 

SVR’s, and the (x, + 1)th stimulus evokes an SVR 
which had occurred earlier, 








_,Na1N-2  Nomtin 
ie N N N N 
N!a, 
- ; 4 
N7**"(N — 2,)! (4) 


The conditional probability distributions for the variables X; , 7 > 1, 
may be derived in an analogous manner. For 7 = 2, 








2 =-Gh-ea-) Ree ~ Det e 
Pulte | m1) = N N N 
(N = ai). + 2) | (5) 





7 N77*"(N Tr Vy a Z)! 
Designating for convenience 
S, = -» Z; ’ (6) 
i=1 
the conditional probability distribution for the general case is 


Py(x, | 21 ie » Ly) oe (i “ “ os 7 ‘N(%) 


i=0 





(N — S,-.)!S, 
= , 7 
N**"N — S,)! ”) 
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The joint probability distribution of the variables X; , 1 S$ 7 S k, is 
obtained by multiplying the probability distribution of the initial variable X, 
by the product of the conditional probability distributions of the remaining 
variables X; ,7 > 1, 








k 
Pr(Xi , *** » Xe) = Py(2s)° II Pula; | %., °** , Be~1) 
i=2 
= Na, Bs . (N ne S;-1)!8; 
N7**(N — a,)! i=: N7**(N — S,)! 
N! : 
= ¥ S; 8 
ne" = S,)! I] ( ) 
N! : : 
-TI]@ - 9, (9) 





~ N"(N — m +B! it 


where n, = S, + », the relationship being derived from definitions presented 
earlier. 

Inspection of (9) shows that the function represented by the first factor 
does not depend on the variables n; , 7 < k, except through n, ; the function 
represented by the remaining product does not depend on N. Thus the 
criterion of (3) is satisfied, showing that n, is a sufficient statistic. This 
implies that estimation of the parameter N may be made solely from the 
knowledge of n, , and that knowledge of the values of n; , 7 < k, provides 
no additional information. 


IV. Tests of Hypotheses 


The foregoing discussion indicates that description of the progress of 
conditioning is reducible to the single parameter N. Although many specific 
vigilance reactions may be identified with accuracy from their unique postural 
components, precise evaluation of N by direct observation is hardly feasible 
at this stage. Accordingly, the experimenter must resort to estimation of N 
from information gathered during the process of conditioning. One approach 
is to proceed with presentation of stimuli until the kth occurrence of the 
conditioned response, k being a previously selected constant. With an observed 
value j of n, , either point estimation or interval estimation of the parameter 
N may be made. One possible procedure for point estimation, the so-called 
maximum-likelihood procedure, involves the use of (1), where N is chosen 
so that the value of Py{n, = 7} is maximized. Procedure for interval estima- 
tion will be described in the section on confidence intervals. 

Another approach to the evaluation of N consists of extending the 
experiment to the stage at which the animal performs a conditioned response 
unfailingly on every trial. From the postulates it follows that those trials on 
which no conditioned responses occur are trials characterized by novel 
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SVR’s. Hence, the total number of trials on which no conditioned responses 
occurred is directly equivalent to N. The practical problem in this approach 
is clearly that of defining the criterion of conditioning. The following discus- 
sion deals with tests of hypotheses concerning the parameter N. While the 
subject is one of intrinsic interest, the discussion is introduced at this point 
as a preliminary step in the development of a procedure for interval estima- 
tion. 

Let N. < N, be two specified positive integers, and designate by Hy 
the hypothesis that N = N, and by dH, the alternative hypothesis that 
N =N,.<A test of H,:N = N, against H, : N = N, involves a single alter- 
native and a single observation. Accordingly, construction of such a test 
consists of selecting a critical region such that 


Py, {mn = j} 
Py,\m = J} me (10) 


where c is chosen so that the probability of the critical region under H, is 6. 
Substitution from (1) into (10), with the factors not involving N can- 
celing out, yields 





Ni Na! (No — j +8)! 
Ni No! (Ni -7 + #)! 
It will be noted that 


Lj +1) _ No Ni-jt+k 
ip "a ae-iee” ~~ 
the last inequality being a consequence of the fact that N, > Ny andj > k. 
Hence L(j) is a monotonically increasing function of j, which implies that 
the most powerful critical region is one for which n, exceeds a selected con- 
stant. This is a uniformly most powerful test, for the specified probability 
of Type I error, of the hypothesis H, : N = N, against H, : N = N,, since 
the constant depends only on N, . It is also a uniformly most powerful test of 
the hypothesis H, : N < N, against H, : N > Nj, since the probability of 
Type I error for any N < N,j is less than that for N, under this test. 
The criterion stated in (10) is thus equivalent to the rule to reject 
H,:N = N, if n, takes on a value j such that 


jz 6, (13) 


= L{j). (11) 


where b is chosen so that 


Py.{m 2 b} = 0. (14) 


Similarly, a uniformly most powerful test of the hypothesis H, :N = No 
(or N = N,) against H, : N < N,j is given by the rule to reject H) :N = No 
if n, takes on a value j such that 
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a, (15) 


es 
iA 


where a is chosen so that 
Py, {m Ss a} = @. (16) 


For testing the hypothesis H, : N = No against H,: N ¥ N, , auniformly 
most powerful test of size 6 does not exist. An approximation to the best 
unbiased test is to choose two numbers a and 6b such that 


Py,{m Sa} = Py,{m 2 0} = 5, (17) 
and to reject Hy : N = N, by the criteria stated in (13) and (15). 


V. Confidence Intervals 


The best one-sided confidence limits on N can be easily constructed. 
Thus, the rule of (13) and (14) is equivalent to the rule of accepting 
H,:N = N, if n, takes on a value j such that 


j<b-1, (18) 
where b is chosen so that 


Py.in Sb—-—1} =1-— 48. (19) 


With @ selected arbitrarily, values of b are calculated for different values 
N, by means of (2), where 7 is the symbol for b — 1. For each N, let b(No) 
designate the value thus calculated. Each value b(N,) — 1 is now plotted 
as the ordinate above the corresponding value Ny of N on the horizontal 
axis. Then, for each value N, of N, 


Py. {nm < W(No.) — 1} =1- 8. (20) 


Let L(n,) be the inverse of b(N,) — 1, thus designating N as a function 
of n, . Since b(N,) — 1 is the maximal value corresponding to Ny , given 
the condition (19), clearly, for a specified n, , L(n,) is the minimal value; 
i.e., for the selected 0, the value b(N,) — 1 of n, may be obtained only with 
those values of N which are equal to or exceed L(n,). Thus, (20) is equivalent 
to 


Py, {No 2 L(m)} = 1 — 8. (21) 


The last equation implies that, whatever the true value N, of N, the 
probability is 1 — 6 that the chance variable n, will come up so that 
No 2 L(n,). In other words, L(n,) is a one-sided confidence limit on N of 
confidence coefficient 1 — @. 

Similarly, for each value Ny of N a value a(N,), corresponding to the 
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a of inequality (15), is calculated. The values a(N.) + 1, plotted as before, 
yield the relation 


Py, {m 2 a(No.) + 1} = 1—- 8, (22) 


and its inverse form 
Py.{No < U(n,)} = 1 = 6, (23) 


the last equation implying that n, will come up so that the probability of 
U(n,) being equal to or greater than the true value of N is always 1 — @, 
whatever be that true value of N. 

The methods for constructing one-sided confidence limits are best 
because the corresponding tests of hypotheses from which they are derived 
are uniformly most powerful. Since there is no uniformly most powerful 
test of H,: N = N, against H, : N # N,, the two-sided confidence interval 
may be constructed by a procedure approximating an unbiased test, or one 
with the shortest acceptance region. For each value Ny of N a value a(No) 
and a value b(N,), corresponding to a and b of (17), are calculated. These 
are such that, whatever the true Np , 


Py, {a(No) +1 Sm S W(N,.) — 1} = 1-— 8, (24) 


and, inversely, 
Py, {L(m) 4 No U(n,)} =l|- 6. (25) 


Since n, is a discrete chance variable, it is not always possible to find 
values of a and b which yield exactly 6 for each N, . A conservative procedure 
of designating the limiting values L(n,) and U(n,) of (25) would be to state 
the integral values of N which lie nearest the limits, outside the interval 


defined by n, and 1 — @. 
Figure 1 gives values of jy and j, for 1 S$ k S 5, such that 


lA 


Py, {m < Ju} = Py. {m = dt} = .90, (26) 


with jy and j, chosen so as to make this probability as little greater than .90 
as possible. The upper limits are designated U and the lower L. The number 
preceding U or L refers to the value of k for which the limit was computed. 
Thus, 

Pooo{n2 S ju} = .90 (27) 


is interpreted as ‘‘the probability is .90 that, given N = 200, the second 
recurrence of an SVR will take place no later than jyth trial.” 

Locating 200 on the horizontal axis, one proceeds vertically to the curve 
2U, and thence horizontally to the vertical axis which is met at n, = 39; 
the latter is the value which satisfies the condition (27). Similarly, the lower 
limit of n. may be evaluated, showing that 
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Confidence Limits on NV 
Psoo{ms = 17} = .90. (28) 


Combining (27) and (28), one obtains an approximation to be used in 
constructing the best unbiased confidence interval on N, defined by the 
parameter 200 and the confidence coefficient .80. Thus, 


Pooo{17 S nz < 39} = 80. (29) 


The same procedure is followed in locating confidence intervals for other 
values of N and k within the limits of the chart. 

In a typical conditioning situation N is unknown, and the problem is 
one of estimating this parameter from the variates n, . The chart shown in 
Figure 1 fulfills this function if used inversely. Assuming, for example, that 
the fifth conditioned response occurs on trial 20, a horizontal line is drawn 
from the vertical axis at 20, and ordinates are dropped from its intersections 
with 5U and 5L. These meet the horizontal axis at 25 and 60. The estimated 
interval on N is now stated as follows: ‘‘Because the fifth conditioned response 
occurred on trial 20, it may be stated with 80% confidence that the interval 
25 to 60 includes the true value of the total number of specific vigilance 
reactions possessed by the organism in the given situation.” 
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VI. A Test of the Model 


The model may be tested by observing whether, for a given N, the 
joint distribution of n;’s is within the region of acceptance selected in such 
a manner that its total probability is 1 — 6. While this method of testing is 
highly desirable, it is not feasible at the present stage of the development of 
the model, inasmuch as it depends on the knowledge of the true value of NV. 
It has been possible, however, to construct a test which does not depend on 
the knowledge of this parameter. The procedure for such a test is described 
below. 

In Section III it was shown that, if the model is true, the statistic n, is 
sufficient, and that, consequently, the conditional probability distribution 


Py{n, = ji, Me ™= Je, °°° » M1 = Je-1 | m, = jr} 


does not depend on N. 
This distribution is given by 


k-1 


IIs, 


Pw(ji » J2 “Neauibe » da) a v ha : (30) 


Pu( Ju) } II S, 


Sy, v=l 





where the expression on the right is derived from (8), and the summation in 
the denominator is taken over all possible sets of values of S, such that 


1S8,8---38,8-:- 34-138h — k. (31) 


An appropriate summation of the numerator in (30) yields the condi- 
tional probability distribution of a single variable n; . Thus, 


_s(e ts) Hs) 
1) = ee 


2 ITs. 


y= 





, (32) 


I 
~ 


- 


where sums in the numerator are taken over all possible sets of values of 
S, such that, in the summation of the first product, 


et. * oe to (33) 


and, in the summation of the second product, 


S; = Sis1 ee Ss Sr-1 <4 Je — k. (34) 


Finally, a summation of (32) over the indicated values of the variable 
in the numerator yields the cumulative conditional probability distribution 
of n; , 
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ii-t i-1 k-1 

> [s(5 Ts)(5 Ts) 

Pin; < ji | m = ju} — Sis S» v=1 ~~. Sy vritl (35) 

> I] S, 

‘with conditions for the other sums defined by (31), (33), and (34). 
Equation (35) provides a direct test of the model. Thus, if the model is 

true, the observed value x of n; will be such that, with probability 1 — 8, 

60 


6 ; 
5 = Pin. Sz|m =f} S1—35- (36) 





Selecting k = 5, and designating the critical area 6 = .20, test observations 
were made on four goats in a conditioning situation. An auditory signal 
served as the neutral stimulus, and flexion of the right foreleg, unconditionally 
evoked by an electric shock, as the response. The results are presented in 
Table 1. 


TABLE 1 


Ordinal Numbers of Trials on Which CR's Occurred 
and Their Cumulative Conditional Probabilities 


























H L Y F 
ne 316 n, = 17 ne = 20 ns = 30 
i dy P J; P 3 P dy P 
1 9 290 7 8h, 8 057 21 097 
2 10 265 hR 085 u 280 22 083 
3 lu 035 uy «82 16 270 23 253 
4 15 {1.00 15 05k 18 60 28 73 

















Capital letters in the top row of Table 1 are code letters of the four 
animals. Beneath each code letter is the number of the trial on which that 
animal gave the fifth CR. The column labeled 7 contains ordinal numbers of 
the first four CR’s. Columns labeled 7; give the numbers of trials on which 7th 
CR’s occurred. Columns labeled P give the cumulative conditional prob- 
abilities calculated by means of (35). Thus, by way of illustration, goat Y 
gave its fifth CR on trial 20, and its third CR on trial 16. The probability 
that an animal which gave its fifth CR on trial 20 should have given its 
third CR no later than trial 16 is .70, which is well within the arbitrarily 
selected acceptance interval of .10 to .90. 

Inspection of Table 1 shows that only two of the sixteen probabilities 
fail to meet the criterion of acceptance. These are probabilities computed 
for the fourth CR of H and the first CR of F. The extreme value of H may, 
however, be explained by the fact that the conditional probability of the 
fourth CR having occurred exactly on trial 15, given that the fifth CR took 
place on trial 16, is .49. Of course, the four tests corresponding to different 
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rows in any column of Table 1 are not independent of each other; however, 
as a rough indication of the validity of the model, Table 1 presents a convinc- 
ing demonstration. 

At this point it may be noted that selecting regions of acceptance for 
each n; separately rather than designing a single region for their joint distri- 
bution actually increases the size of the test; i.e., the total probability that 
at least one of the rows in any given column would give a result leading to 
rejection of the model when it is true is greater than 6. The effect is illustrated 
in Figure 2, where the test is applied to the case of n, and n., givenn, ,k > 2. 


A nn, B 





























FIGURE 2 
Schematic Representation of Regions of Acceptance 


Since n. is greater than n, , the joint probabilities are greater than zero 
above the diagonal, and zero elsewhere. FR represents the region of acceptance 
in which the sum of joint probabilities is 1 — 6. Construction of R involves 
laborious mathematical computations which would be hardly justifiable in 
the absence of specified alternatives to the model. Since a test of the model 
at this stage is intended merely as a detecting device of obvious fallacies, 
if any, the substitute procedure, defined by (35), consists of selecting separate 
regions of acceptance for n, and n, , such that in each case the sum of mar- 
ginal probabilities excluded on each side of the region is 0/2. This construction 
of the regions of acceptance yields probability intervals which are approxi- 
mately the shortest possible, since the ordinates which form the limits of this 
region are approximately equal. Such a test should give good power against 
most reasonable alternatives to the model. 

In Figure 2 the area bounded by the vertical lines A and B is the region 
of acceptance for n, , and the sum of joint probabilities within this area is 
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1 — 6. Similarly, the area bounded by the horizontal lines C and D is the 
region of acceptance for n, , and the sum of joint probabilities within this 
area is also 1 — 6. With this construction the condition for non-rejection is 
that both n, and n, occur within the rectangle bounded by the four lines. 
It is clear, however, that the sum of joint probabilities within this rectangle 
is less than 1 — 6. Hence, the size of the test is actually greater than 96; or, 
with a diminished probability of Type II error, the power of the test is some- 
what higher than that of the precise test based on the region of acceptance R. 


VII. Conclusions 


The mathematical model for conditioning has been presented primarily 
as an illustration of a technique for the treatment of intervening variables, 
rather than as a substitute for the existing systems. Since the model does not 
demand rigor in the definition of these variables, it has possibilities of adapta- 
tion to theories of behavior which regard autonomous central processes as 
crucial. The author is currently engaged in one such adaptation, extending 
the model to include problems in discrimination learning. The extension 
should furnish a method for treating Krechevsky’s “hypotheses” (8) as 
stochastic variables, thus providing a testable alternative to Spence’s model 
(18). 

In another application, the model provides, in the value of N, a measure 
for the study of individual differences. Organisms requiring many trials to 
reach a stipulated criterion will, on the average, yield larger estimates of N. 
Since N is the sole parameter involved, the phenomenon of slow learning 
may be interpreted as inability on the part of the organism to restrict its 
range of vigilance to the situation at hand. Since, however, the author knows 
of no way to test the latter inference, it must remain, at least for the present, 
on the level of intuitive generalization. On the other hand, estimates of N, 
based on a limited number of trials, are intended to provide uniform quantita- 
tive indices for a variety of psychological investigations, ranging from selection 
of stratified samples to problems of heredity and environment. Conversely, 
for the same organism, the parameter may furnish a comparative estimate 
of the efficacy of various learning situations. 

Studies involving large populations are often prohibitive because of time 
and effort required to train each subject to a criterion of mastery. Instead, in 
training each subject to a predetermined number of conditioned responses, 
one is able to make a reasonably accurate quantitative estimate of the sub- 
ject’s susceptibility to training, with a substantial economy in labor. Further- 
more, the potentialities of the parameter for the construction of gradients of 
similarity may lead eventually to a reexamination of the phenomena of 
generalization and pseudo-conditioning in the light of mediating factors 
susceptible to systematic treatment. 

The model may be regarded from one of two contrasting theoretical 
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positions. One may either take the stand advocated by Skinner (17) and 
view the parameter N purely as “a formal representation of the data reduced 
to a minimal number of terms,”’ or one may follow the course suggested by 
Pratt (15) and postulate independent existence of neurophysiological events 
corresponding to this parameter. The author leans towards the latter point of 
view because of its potentiality as a source of future hypotheses. Moreover, 
improved techniques of observing and recording specific vigilance reactions 
may ultimately lead to an independent estimate of the parameter N, thus 
serving as the second of “‘at least two methods” stipulated by Bridgman (1) 
“of getting to the terminus.” 
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A study by Shaw (7) some twenty years ago is frequently cited by 
social scientists to support the generalization that groups are superior to 
individuals in problem-solving. Shaw suggests that personal interaction 
within the group is responsible for the superior performance of groups. This 
article re-examines her data in the light of two models which propose that 
the difference in quality of solution between group and individual performance 
is solely a matter of ability. It is shown that Shaw’s data may be considered 
to have been an outcome of behavior postulated by the models. Since Shaw’s 
observations relate to a special population and to special kinds of problems, 
the proposed models may not be appropriate under differing experimental 
conditions. In fact, Lorge et al. (4) have indicated that experimental demon- 
stration of the superiority of groups over individuals in problem-solving 
depends not only on the kind of group but also on the kind of problem to be 
solved. In addition, the diversity of transfer of training for groups and for 
individuals is considered. 


Introduction 


Since this article treats only the data from the first half of the Shaw 
experiments, a brief description of this part will be given. Three problems (3), 
each a well-known mathematical puzzle involving the transport of objects 
under certain constraints, were given to groups and to individuals. The 
first, known historically as the Tartaglia, requires the transport of three 
jealous husbands and their three beautiful wives across a river in a boat 
holding just three at a time, under the constraint that no husband will allow 
his wife in the presence of another man unless he is also present, and with 
the specification that only husbands can row. The second problem, the 
historical Alcuin, is similar in that it requires the transport of three mission- 
aries and three cannibals in a boat carrying two at a time under the constraint 
that missionaries may never be outnumbered by cannibals, and with the 
specification that all missionaries and just one cannibal have mastered the 
art of rowing. The third problem, the historical) Tower of Hanoi, or disc 
problem, is similar to the previous two in that it requires the transport of 
three graduated discs, stacked in order of size, to another position via an 
intermediate way station, under the constraint that a larger disc may never 
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be placed on a smaller one, with the specification that only one disc may be 
moved at a time. 

Shaw’s subjects were students in a social psychology class which had 
been divided into halves: one half being formed at random into ad hoc like-sex, 
four-member groups, and the other half serving as individuals, i.e., as controls. 
Thus, the performances of five groups were contrasted with those of twenty- 
one individuals. Each group and each individual was asked to solve all three 
problems in the same sequence. 

A criterion for comparing group and individual performance is the 
contrast between the proportion of individuals and the proportion of groups 
successful in the solution of each problem. For Shaw’s three problems, the 
proportions of individuals and groups mastering each solution are given 
in Table 1 (Columns 1 and 3). When, for each problem separately, the differ- 
ence between proportions of success in groups and in individuals is tested, 
using an upper one-sided .05 critical region, the data for Problems I and II 
support the generalization of group superiority, but the difference between 
groups and individuals for Problem III is not statistically significant. The 
statistical test (2, 6) of the hypothesis that two proportions are equal is 


0G ore 0, . 
2. = 1 
2° (1) 
VN, ' Ne 
where 6 = 2 arcsin Wp, p = proportion of success, N = sample size, 


and the subscripts J and G refer to individuals and to groups, respectively. 
The function z is approximately normally distributed with zero mean and 
unit variance under the hypothesis tested. The results of this analysis could 
be used to support Shaw’s conclusion (7, p. 504): “Groups seem assured of a 
much larger proportion of correct solutions than individuals do.” 

Of the five groups, however, two solve none of the problems and two 
solve all problems. Of the twenty-one individuals, none solves more than 
one of the three problems.\The fact that some groups solved none and some 
groups solved all the problems suggests the hypothesis that the observed 
group superiority is dye to the abilities of the members of the group rather 
than personal interaction>Such an hypothesis may be expressed in terms of 
two ability models: (A) group superiority is a function only of the ability of 
one or more of its members to solve the problem without taking account of 
the interpersonal rejection and acceptance of suggestions among its members; 
(B) group superiority is a function only of the pooled abilities of its members. 
The latter model, B, implies that any problem may be composed of, and 
solved in, two or more stages. Model B, of course, reduces to Model A for 
one-stage problems. 
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Model A 


Under Model A the probability of a group solution is the probability 
that the group contains one or more members who can solve the problem. 
This non-interactional ability model for any specific problem can be expressed 
mathematically as follows: Let 


P, = the probability that a group of size k solve the problem; 
P, = the probability that an individual solve the problem. 
Then 


Pg =1-(1-—P,', (2) 


where Pg and P, are population parameters considered fixed for the specific 
problem and the specific population. 

Confidence in the tenability of this non-interactional ability ae 
can be decided by testing it on the basis of sample observations. Assume 
Ng observations of group performance and N, of individual performance. 
Then sample estimates pg and p; may be obtained, where pg and p; are the 
ratios of the observed successes to attempts for groups and for individuals, 
respectively; pg should be compared with pg, (or equivalently, p; with 
Pr as where 


Poza = 1— (1 aces Pi)’; (3) 
or equivalently 
ti =i- 1-9. (3a) 


The observed difference (pg — pe,) certainly can be used as a test of 
the model, for the smaller the observed difference, the more tenable is the 
model and, the larger the observed difference, the less tenable it is. If an a 
level of significance is used, then the model would be rejected if 


Pr {(e — Pes) > Os} Se 


and accepted otherwise, where O, is the observed difference. A one-sided 
test is used since negative personal interaction (an unable majority preventing 
an able minority from solving the problem) is not anticipated in the Shaw 
groups, and thus the test is made most powerful against all alternatives 
indicating positive personal interaction. That is, if positive interaction does 
exist, the probability of rejecting Model A is higher than the probability 
given by a two-sided test of the same.size. A similar argument holds for 
(pr, — Pr), Since it is an equivalent test. . 
To test the existence of the model, the distribution of (pg — pg,) must 
be obtained. Although pg and pg, are independently distributed proportions, 
the distribution of their difference is no longer related to the standard distri- 
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bution of the difference of two binomials since pg, is not a binomial; pg, is a 
function of p; , which is a binomial. This complicates obtaining the exact 
distribution of (pg — pe,) either in closed form or in a form such that existing 
tables may be used. Since sample sizes are small, however, it is not too 
tedious to compute the exact probabilities of all differences larger than the 
observed difference under the assumptions that (1) the model holds and 
(2) the nuisance parameter (either P, or P;) is replaced by a sample estimate. 
It is interesting to note that 


6P(1 rar Py 4. P,(1 ow P,)*(4 ong 11P;) 











Me.) = 1 — 1 — Pp’ - N, N? 
_ PA — Py) — 6P; + 6P3) 
NS 
and 
eq = FD nea — Paty + Bee + + ee, 
where /;(P;),7 = 1, 2, --- , 6, are eighth-degree polynomials in P; . Thus, 


for large N; , pc, is an unbiased estimate of P, and its variance is 16(1 — P;)° 
2 

Te, = 

For the three Shaw problems, there are six possible values for pg and 
twenty-two possible values of p; . In Problem I, for instance, the observed 
difference (pg — Po,) is .14, where pg, is computed from formula (2) using 
the value of p; reported by Shaw. It is necessary, therefore, to tabulate all 
possible differences greater than the value .14. For these tabulated differences, 
the probability of each is computed under the specified assumptions. The 
probability for each difference is the product of the probabilities that the 
Po and pg, involved in the difference do occur when the two assumptions 
hold. The probability that a pg, occurs is equal to the probability that its 
corresponding p,; occurs. The probability for pg and pg, may be obtained 
readily by reference to a binomial table (5). The sum of these products of 
probabilities is the exact probability that an observed difference will exceed 
.14. In Table 1, column five gives the exact probability, P, that the observed 
difference (pg — Pe,) Will be exceeded by chance. 

An approximation to the exact probability can be made when p, is 
small enough so that p,, can be approximated by kp, , for then 


(2 arcsin Vkp;) and Fs (2 aresin V pe) 


ole 


are approximately normally distributed with variances 


<= and , respectively. 


= a 
N; Na 
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Thus, if Model A holds, 
ae 2 arcsin V De — 2arcsin V kp; 


iF (4) 
N N; 


is approximately normally distributed with zero mean and unit variance. 
Some liberties have been taken in this approximation by assuming kp, to 
be binomial since it can assume values greater than one. This assumption, 
apparently, does not impair its usefulness for the Shaw experiments. In 
Table 1, column six gives P’ = P,{z > 2}, where 2) is the specific value for 
z corresponding to the observed difference. Notice that the approximation 
obviously gets better as p; decreases. 

The hypothesized non-interactional ability Model A, thus, is rejected 
for Problem II, but accepted as tenable for Problems I and III. For each of 
the three problems, however, pe exceeds pe, , suggesting that Model A 
might be modified and improved. 

















TABLE 1 
PI Pla Pe PGa P P’ 
Problem I 3/21 = .14 = .20 3/5 = .60 .46 38 =. . 48 
Problem II 0/21 = .00_—-.20 3/5 = .60 .00 .029 .023 
Problem IIT 2/21 = .095 .12 2/5 = .40 .33 43.48 
pr = ratio of individual solutions to attempts 
Pq = ratio of group solutions to attempts 


Pr, = estimate of P; from Model A and observation pg 
estimate of Pg from Model A and observation p,; 


Pes = 

P- = probability (pg — pg,) is exceeded by chance under Model A and Pg or Py 
is replaced by sample estimate 

P’' = approximation of P replacing pg, by kpr 


Stage-wise Solutions 


Within the framework of strict ability models, a modification of Model 
A may be made. Solution of eureka-type problems may be considered the 
consequence of pooling success at each of several stages of the problem. 
Shaw’s study, indeed, suggests the plausibility of such a stage-wise model. 
In reporting about the erroneous moves made by her subjects in solving 
Problem I she states that 13 different individuals made an error in the 
first move, four made an error in the third move, and one made an error in 
the fifth. For groups, however, she reports “No group erred on the first 
move; one erred on the third and one on the fourth.” 

Shaw’s description of the errors in Problem I suggests the importance 
of the first move, since 13 of the 21 individuals failed to make the correct 
first move. Each group, however, apparently had in it at least one member 
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who made the first move successfully since none of the five groups erred on it. 
Once the first move is accomplished, the difficulty of the problem changes. 
Five individuals who made the first move correctly did fail at subsequent 
stages, i.e., made the first correct move but failed at later moves. Two groups 
failed at some later move, suggesting that the group lacked at least one 
member who could accomplish some later move. 

Assuming that a problem is solved in s independent stages, (not the 
moves Shaw mentions, since such moves may be interrelated) and assuming 
that Model A (equation 2) applies at each stage 7, then, 


Po=T0-a-P), Pr= TP, (5) 


where s is the number of stages, and P;, is the probability of success for an 
individual at stage 7. Now for the purpose of estimating s from the Shaw 
data, consider the assumption that P,, is the same for each stage; thus 
P,, = Pr,’ , then 

Pe = (1-01 - PIT. (5a) 


This assumption may possibly be unrealistic, but it is necessary to provide 
an estimate of s from Shaw’s data. 

Substituting the estimates for Pg and P, from Shaw’s Problems I 
and III, s = 2 (to the nearest integer) for both problems; for Problem I, 
s = 1.6;for Problem III, s = 1.5. Since the observed proportions of individual 
solutions for Problem II is zero, s is indeterminate. (If for Problem II, P, is 
replaced by p;, = .2, then s is very close to 1.) 

It is not too difficult to rationalize the two-stage nature of the problems. 
For example, in the problem of the jealous husbands and their wives, the 
basic first stage requires the recognition that the boat, which may carry 
three, must be limited to taking just a husband and his wife across the river. 
Once this first stage is solved, the second and final stage is analogous to 
repetitious knitting. It is interesting to note that if it is assumed that p; = .05 
and pg = .95 (an indication of overwhelming group superiority through 
positive personal interaction among its members) then by (5a) s = 10 to 
the nearest integer, an estimate even larger than the number of moves 
required in some of the Shaw problems. While all possible pairs of the values 
Pc and p; have not been considered, an excessively large difference gives a 
value of s inconsistent with a psychological analysis of the problem into 
steps or stages. . 

Model B 

On both a probabilistic and a content basis, a two-stage problem may 
be reasonably inferred; assume now that Problems I, II, III are two-stage 
problems. For this situation, the population of individuals may be classified 
in the following way: 
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Population Type Ability Proportion in the 
Population 
Xi Solve both stages P, 
Xe Solve stage 1, not stage 2 P, 
X3 Solve stage 2, not stage 1 Ps 
X4 Solve neither stage rs 


Assuming this multinomial distribution of ability, appropriate ability 
interaction within a* group of four individuals can accomplish a solution 
even though the group has in it no one member who can solve the problem 
as a whole; for example, the group whose members symbolically are repre- 
sented as X, X; X; X; . Consider all possible samples of four (X; X; X, X,,) 
from this population. It is possible to enumerate all groups of four that can 
interact to accomplish whole solutions solely by pooling their abilities. Any 
group containing at least individual X, , or at least individuals X, and X; 
jointly, will be successful. The probability of occurrence of each sample of 
four is given by the multinomial distribution if P, , P, , P; , P, are known. 
The sum of probabilities of the occurrence of each group of four that can 
complete a stage-wise solution is the probability of a group solution on the 
hypothesis of stage-wise pooling of ability. Thus, under Model B, the prob- 
ability of a group solution is obtained by a special summation of the elements 
of the multinomial distribution. 

Currently, not enough knowledge is available for estimating all the 
probabilities P, , P, , P; , and P, . At best, in line with current knowledge 
of the distribution of ability, the psychologist can merely supply reasonable 
estimates for P, , P; , and P, . In Shaw’s data, P, can be estimated from 
the sample. This still leaves two degrees of freedom for choices since the 
sum of the four probabilities is one. 

Suppose these two free choices are subject to the restriction that they 
closely reproduce pg and that they are not inconsistent with psychological 
knowledge of the distribution of ability. For the kind of problems treated 
by Shaw, psychological evidence indicates that the percentage of persons 
who will fail on both stages will be larger than the percentage who can solve 
both stages or any one stage. This, of course, does not uniquely determine 
the four parameters but it is interesting to see that reasonable estimates do 
exist. For example, if in Shaw’s Problem I, P; = .15 (p; = .1428), P, = .15, 
P; = .15, and P, = .55, then pg, = .61 as contrasted with pg = .60. Here 
P, , P; , and P, were guesses to reproduce the observed pg . They are also 
not inconsistent with the distribution of ability. Actually pg can be reproduced 
exactly, but. it was not considered necessary to alter the p,’s slightly to 
accomplish this since enough leeway has already been taken to reproduce a 
sample value. Moreover, slight changes would not alter any decisions about 
the reasonableness of the P,’s. This argument also applies in the following 
discussion of Problems II and III. Incidentally, P, = P,; = .15 leads to 
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P,+ P; = P, + P; = .30; this indicates that the probability of an individual’s 
success in stage 1 and stage 2 is .30. By (5a), Pg = .58 as contrasted with 
Pa = .60, which suggests that the assumption which yields (5a) from (5) 
is realistic after all. 

Moreover, if in Shaw’s Problem I, P, = .15, P, = .30, P; = .30, and 
P, = .25, a situation definitely inconsistent with the distribution of ability, 
we get pc, = .92, a value noticeably different from pg = .60. 

Similarly for Problem III, if P,; = .10 (p, = .0952), P, = P; = .10, 
P, = .70, then pz, = .42, as contrasted with pe = .40. Also referring to 
(5a), Pe = .35, as contrasted with pg = .40. In Problem II, P, = .2, P, = 
P,; = .05, and P, = .70 yields pg, = .61, as contrasted with pz, = .60; 
again referring to (5a), Pg = .46, as contrasted with pg = .60. It should be 
noticed here that this big difference arises from the use of p;, = .2, which 
would lead to a one-stage problem if it were p; . Notice that this is reflected 
also in P, = P; = .05, for P, = P; = 0 is a one-stage model. Substitituion 
of the unrealistic observed pr = 0 would yield nonsensical results. This 
information is presented in Table 2. 

It is interesting to note the premium gained by the two-stage model. 
Model B can be made to account for most of the excess (Vg — pe,) not 
accounted for by Model A. If Model B holds, the excess is the probability 
of a group solution when individual X, is not in the group of 4. For the 
weights described, this is .13 for Problem I, and .077 for Problem III; these 
should be compared with (p_ — peg.) = .14 for Problem I, and .07 for Problem 











TABLE 2 
Problems 

I II III 
Pe .60 .60 .40 
PGa 46 .00 30 
Pos .61 61 .42 
P, .15 .20 .10 
P, 15 .05 .10 
P; 15 .05 .10 
P, .55 .70 .70 





pr = ratio of individual solutions to attempts 

Pg = ratio of group solutions to attempts 

Po, = estimate of Pg from Model A and observation p, 

Pox = estimate of Pg from Model B and weights P; , P2, Ps, and Pe 

P, = probability that an individual will solve both stages in Model B 

P, = probability that an individual will solve stage 1 but not stage 2 in Model B 
P; = probability that an individual will solve stage 2 but not stage 1 in Model B 
P, = probability that an individual will not solve either stage in Model B 
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III. For Problem II, the weights used lead to an excess of .017, but this is 
just another reflection of the fact that the replacement of p; by p,, leads to 
a one-stage problem. 

The stage-wise model hypothesizing the pooling of ability tends to 
reproduce the observed pg when reasonable weights are used. Indeed, un- 
reasonable weights produce major discrepancies from the observed pg . 
‘. The implication of the model is that group superiority may be conceived 
as a function only of pooling the abilities of its members. Ultimately, empirical 
estimates must be obtained for P, , P; , and P, . One experimental procedure 
for such estimates would require individuals to solve the problem. For 
instance, in a two-stage problem, those individuals solving the problem 
(as in the Shaw data) provide a basis for estimating P, . Some individuals 
who failed the whole problem, however, will have accomplished stage 1 
successively but failed on stage 2, providing a basis for estimating P, . The 
remainder, those who could not accomplish stage 1, would be given the 
problem reduced by the accomplishment of stage 1, reported as a fact, with 
the requirement that the ‘‘new”’ problem be solved. Some of the individuals 
will then solve the “new” problem providing a basis for estimating P; . 

When P, , P. , P; , and P, are estimated by p; , p2 , p3 , and p, on the 
basis of sample observations, assuming Model B holds, a value pg, will be 
obtained and contrasted with pg . As in Model A, the probability that an 
observed difference will be exceeded by chance must be computed in order to 
examine the tenability of the model. Under the assumption that Model B 
holds, and replacing P, , P, , P; , and P, by their estimates, it is possible 
to obtain the exact distribution of (pg — pe,), although it is extremely 
tedious to compute. If p; is based on n; observations, then pc, can assume 
(ny + 1)-(mz + 1)-(nm3 + 1)-(m, + 1) values. Even if the sample sizes are 
small, say n; = 5, pe, takes on 1296 values. This, plus the difficulty of 
actually computing the probability of a difference (pe, — pce), renders the 
technique somewhat useless. Moreover, for large samples an asymptotic 
method seems fruitless because of the special way the multinomial distri- 
bution is summed for this situation. 

Suppose, however, a confidence interval for Pg , say Pe, and Pe, 
is obtained from pg . Assuming the model holds, all the sets p, , po, Ds » Ds 
which yield values between P,, and P,, inclusive, form a confidence region 
for P, , P. , P; , P, . Actually all that need be done then is to consider the 
value pg, yielded by the observed p, , p2 , D3 , Ps . If this value lies between 
Pg, and Pg, the model is tenable for the specified confidence coefficient 
employed, let us say 1 — a, or equivalently for the significance level a. 


Pooling of Data 


Shaw pools the results for the three problems, neglecting the fact that 
the same individuals and same groups worked the three problems in the same 
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sequence. Thus, she contrasts 8/15 or 53 per cent success for groups with 
5/63 or 7.9 per cent success for individuals. Using the z test given by (1) 
with the awareness that the lack of independence renders it inadequate, 
this difference is statistically significant at the 5 per cent level. Moreover, 
since the correlation between observations can be assumed to be positive, 
the decision of statistical significance is on the conservative side. Also, 
Model A is rejected using the z test given by (4).\It should be emphasized 
that of the five groups, two solve none of the three problems and two solve 
all. Of the twenty-one individuals, none solves more than one of the three 
probiems! Two alternate hypotheses are suggested: 1) Model B is operating; 
2) groups do better than individuals in a sequential solution of problems of 
the same kind. Hypothesis 2 can arise from three possibilities: (a) negative 
transfer in individuals, zero or positive transfer in groups; (b) zero transfer in 
individuals, positive transfer in groups; (c) positive transfer in individuals, 
greater positive transfer in groups. As regards hypothesis 2, Cook (1), using 
two versions of the disc problem (Problem III), varying in difficulty of 
sequence, implies “that transfer ‘spuriously’ lowers the probability of a 
given individual achieving the same degree of success or failure (relative to 
the rest of the groups) on both problems.” The evidence from Shaw’s groups 
suggests somewhat the same conclusion by indicating the plausibility of 
positive transfer in groups in sequential solution of problems of the same 
kind. A carefully designed experiment to ascertain the superiority of groups 
over individuals in transfer of training is suggested by this combined evidence. 
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In principle it is possible to design automata to display any explicitly 
described behavior. The McCulloch-Pitts ‘‘neuron”’ is a convenient elementary 
component for the control mechanisms of automata. Previously described 
techniques permit the design of an automaton which would arbitrarily well 
simulate human behavior. The difficulty of producing such a design lies 
primarily in formulating an explicit description of the required behavior. 
The control mechanism of such an automaton would be of very great logical 
complexity. Its mode of operation probably would not resemble that of a 
human brain. The brain is more plausibly represented by stochastic models 
as proposed by Hebb. Such models can more easily be designed or understood 
by reason of lesser logical complexity. A method of computational investi- 
gation of the functioning of such stochastic models is described. Several 
extremely simple models have been investigated. One is shown to have 
properties suggestive of learning ability. 


I. Introduction 


The possibility of constructing or designing devices which can to some 
extent simulate the complex patterns of behavior of man or the higher 
animals, particularly those aspects of behavior favored by an intact nervous 
system, has long excited lively speculation. It seems desirable to continue 
such speculation in the present era and to bring modern resources to bear 
on the problem in the hope that some light may be shed on the mechanisms 
operative in these complex behaviors. 

At best it can be hoped that this approach may yield hints of possible 
mechanisms; yet, in light of the formidable difficulties of direct studies of 
complex nervous systems, even this modest hope amply motivates such 
investigations. The utility of this approach need not be further stressed since 
many authors have testified to its fruitfulness. 

A number of lines of argument directed toward the establishment of 
limits on the range of possible behaviors possible for an automaton have 
been explored. Several often-proposed arguments to this end have been 
shown by Turing (14) to be incapable of clear formulation or incapable of 
leading to the desired limitation. These arguments seem to be stimulated 
by a common motive: in the course of a normal childhood development a 
person finds it possible to some extent to reduce to order the initial chaos of 
observation of the external world, including animate objects. By reason of 
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this orderliness he finds himself possessed of a measure of control. The 
subjectively recognized “‘self”’ is somewhat separated from the environment 
in the ordering. Since the analogy between his objectively observed self and 
his companions is too striking to be overlooked, there arises the desire to 
exempt his kind from the observed lawfulness of the external world. (The 
term ‘“‘kind’’ by intent lacks precision. Primitive peoples have often extended 
this exemption very generously. The modern tendency seems to be to restrict 
it to our own species or, more narrowly, to one’s own tribe, sex, sect, etc.) A 
view which may be so motivated is aptly expressed by Jefferson (6): “No 
mechanism could feel (and not merely artificially signal, an easy contrivance) 
pleasure at its successes, feel grief when its valves fuse, be warmed by flattery, 
be made miserable by its mistakes, be charmed by sex, be angry or depressed 
when it cannot get what it wants.” 

That an upper bound can in fact be placed on the complexity of behavior 
which an automaton could display can be shown by paraphrasing an argument 
due to Turing (13). He describes as a computable number a denumerable 
sequence of digits which can be generated by a computer of finite complexity 
(i.e., an infinite ordered sequence of decisions implicit in a description which 
is finite, however lengthy). He shows that the set of computable numbers is 
denumerable; thus they are neglibibly few among all real numbers. In a 
similar vein we may approximate the entire range of environmental circum- 
stances and the reaction of an automaton thereto by denumerable sets. A 
realizable behavior pattern is a functional relation between these sets which 
permits an implicit description in finite terms, hence demonstrable by an 
automaton of finite complexity. Turing’s argument, appropriately transposed, 
shows that the realizable behavior patterns are denumerable, hence include 
negligibly few among the continuum of behavior patterns. In outline, this 
argument may be stated as follows: The finite descriptions may be translated 
into a standard language in which each description consists of a finite sequence 
of words drawn from a finite vocabulary. These descriptions may be classified 
by number of words and, within each class, arranged in alphabetical order. 
Thus the set of all descriptions which the language admits (including meaning- 
less and redundant descriptions) can be enumerated. It follows that the set 
of behavior patterns which can be so described is also denumerable. 

Unfortunately (from the point of view of the desire described above) 
this argument will not serve to limit the extent to which an automaton can 
mimic human behavior unless it can be shown that the range of human 
behavior patterns extends beyond this denumerable set. 

It must have appeared likely in previous centuries that a quite stringent 
limitation on the possibility of actually constructing an automaton to mimic 
human behavior is imposed by grossness of constructable elements of ma- 
chinery—gears, levers, pulleys, etc.—in comparison with human size. This 
limitation need not be seriously considered here, since it is of little importance 
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to the present purpose that an automaton look like a man. In any case, this 
appearance of limitation is weakened by the present development of semi- 
conductor electrical elements, which promises almost unlimited miniaturiza- 
tion of the types of electronic apparatus which now seem suitable elements 
for the construction of automata. 

It is also to be noted that valuable hints to neurophysiology may arise 
from the design of an automaton which, by reason of technical or economic 
limitations, may not be constructed in the metal. However, as the example 
of Walter’s Testudo (16) strikingly displays, the verisimilitude of an auto- 
maton’s simulation of animal behavior can far better be judged by direct 
observation of the behavior of the automaton than by study of its wiring 
diagram or differential equations. It thus seems desirable to give preferential, 
although not exclusive, consideration to those designs for automata which 
could be built without extravagant effort. 

Another line of argument bearing on the range of behavior patterns 
accessible to man-made automata has been explored by von Neumann (15). 
It is clear that the complexity of the behavior pattern of an automaton is 
subject to an upper limit dependent on the complexity of its mechanism. 
This complexity in turn would appear to be limited by the extent of human 
ingenuity, which may in turn be similarly limited. One is at first tempted 
to believe that suitable measures of these complexities would permit proof 
of a hierarchic ordering of machines. Further, he might be tempted to believe 
that a machine can fabricate, or in some sense design or conceive, only 
machines of lesser complexity than itself; similarly he might think that a 
man-made automaton must be, in respect to this measure, inferior to its 
creator. It may seem that some degradation of information must occur 
between the construction and operation of a machine. The hope for such a 
theorem is damped by von Neumann’s description, in outline, of a machine 
capable of fabricating a duplicate of itself after first (this to exclude trivial 
solutions) building a locomotive. The trick of design permitting this descrip- 
tion depends on the distinction between the actual and logical complexities 
of a machine. For example, the elaborate set of instructions which governs 
its manipulations may be carried as a perforated tape which, though of 
great actual complexity, can be copied by reiteration of a simple elementary 
operation. 

One might still hope to show it to be impossible for a man to understand 
a device of a complexity equal to his own even though he could build it 
(suitable meanings being given to these terms). Such a demonstration would 
require comparisons of logical complexity, so defined as not to increase 
markedly with simple reduplication of components, rather than actual com- 
plexity, defined, for example, by a simple counting of parts. The description 
of a universal computer given by Turing (13) would seem to weigh against 
this hope. This instrument, of finite logical complexity, is one capable of 
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predicting the operation of any computer, of however great complexity. 
This great reduction in complexity is obtained at the expense of speed of 
operation which, though desirable, can hardly be regarded as of fundamental 
significance in human understanding. Moreover, an increase in speed can 
often be obtained by duplication of components without increase in logical 
complexity. It thus does not seem likely that arguments along these lines 
can place the desired limit on the range of behavior possible for an automaton. 


II. Neural Network Models 


Automata have variously been conceived as primarily composed of 
marble, clay, clockwork, and, more recently, of vacuum tubes, relays, etc., (2). 
The chief advantage of the modern components lies in the ease with which 
they can be physically assembied in practically operative equipment. If 
only a theoretical reduction to practice is required, the advantage lies rather 
with the more intuitively understandable cams, detent gears, levers, and 
Jaquard cards involved, e.g., in Babbage’s conception of his analytical 
engine (1). A still more convenient basic component is the idealized ‘‘neuron”’ 
(designated as the M. P. neuron) used by McCulloch and Pitts (8), Culbert- 
son (3), and others. The properties of this mathematical construct are 
sufficiently simple to be easily realized in manufacturable equipment, yet bear 
a close analogy to the observed neurons of physiology. 

Briefly, the properties of a network of these mathematical neurons are 
the following: In each of a series of time intervals each neuron may “fire” 
or remain quiescent. Two neurons may be connected by a process which 
produces an effect on the second neuron whenever the first fired in the 
preceding time interval. This effect may tend either to excite or to inhibit 
the present firing of the second, depending on the kind of connecting process. 
The firing or non-firing of each neuron is dependent on the number of presently 
received effects of the two kinds. The functional dependence of this decision 
on the two numbers is subject to choice. 

The mathematical neuron is well adapted to the design of automata. 
One can, with surprising ease, design circuits to mediate even quite complex 
activities (3). The chief difficulty lies in formulating a precise description of 
the desired activity. The process of translating this description into circuitry 
can be carried out essentially by rote. It is thus not out of reason to assume 
that circuits could be designed to mediate each identifiable aspect of the 
behavior of a human adult. It may further be assumed possible to design 
suitable interlocks which suppress the activities of these circuits until after 
appropriate environmental circumstances have occurred. For the purpose 
of the argument to follow it will be assumed that in this way one could design 
an automaton which would satisfactorily simulate both the learning steps 
and the learned behavior patterns of an entire lifetime. By reason of its 
part-by-part design this may be termed a block-diagram model. 
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III. Defects of the Block-Diagram Model 


The neural-network-controlled automaton so designed seems at first 
to supply a complete answer to the primary question of this investigation. 
It would simulate human behavior in every situation taken into consideration 
by the designer; thus, the completeness of the simulation is limited only by 
his patience. Yet for a number of reasons this answer is unsatisfying; this 
block-diagram automaton does not seem to present a close analogy to the 
observed human nervous system as indicated by the following considerations: 


(1) The automaton might include some circuits permitting it, say, to dis- 
play a full command of every modern language and further circuits inhibiting 
their action until environmental circumstances (e.g., exposure to a course 
of study of a language) make the display of each ability seemly. This heroic 
design effort might require the use of most of the 10'°-odd neurons permitted 
by direct analogy, yet would still not provide, for example, for the learning 
of Sanskrit. The view that human ability to learn any few of the enormously 
many known languages is based on the release from inhibition of precisely 
arranged circuits is strikingly unappealing. 

(2) The learning ability of the higher animals seems quite unsystematic 
in comparison with that of an automaton designed in this way. For example, 
a man’s capacity to learn to drive an automobile or a rat’s to learn a T-maze 
can hardly be ascribed to the pressure of natural selection in the short period 
since the introduction of these features of environment. They rather suggest 
the operation of an unspecific learning ability operative in a wide range of 
circumstances. An automaton which mimics the behavior of a laboratory 
rat by means of many marvelously contrived circuits, each initially frustrated 
by an equally marvelous inhibiting circuit, suggests great virtuosity but not 
true efficiency of design. 

(3) If the block-diagram automaton is efficiently designed with respect 
to the number of neurons used, the modifications in its behavior which would 
result from the extirpation of a small fraction of its neurons would be most 
striking. Some of its possibilities of learning would disappear, other abilities 
would spring forth full-blown as their inhibiting mechanisms are inactivated, 
without the normally required training program. Some abilities previously 
acquired by training would be irrevocably lost. If the control circuits are 
redundantly designed, reliability of operation being achieved at the cost of a 
many-fold increase in the number of neurons used, similar effects would 
follow the destruction of larger parts of the control circuits. 

The effects of cerebral injury in the mammals present a quite different 
picture (10). The resulting changes in behavior are often surprisingly mild, 
suggesting considerable redundancy of design; to some extent these changes 
are-reparable by retraining. These changes are, from the present point of 
view, uniformly pejorative. [As Wiener points out (17), prefrontal lobotomy 
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may usually be expected to increase a patient’s tractability, not his wit.] 
To find, for example, the victim of a brain accident subsequently in command 
of a new language would occasion surprise. 

(4) The complexity of the ingeniously contrived neural circuit con- 
trolling the behavior of the block-diagram automaton might be comparable 
with that of the intricate interconnection of the 10'°-odd neurons of a human 
central nervous system. The latter, however, is presumably built to a pattern 
held in the 10°-odd genes controlling human heredity. [The number of human 
gene positions has variously been estimated as 25,000 to 100,000 (12).] These 
must also be presumed to determine the architecture of other than neural 
tissues and much of intracellular physiology as well. Thus, it seems likely that 
the genes serving to determine the circuitry of the human central nervous 
system number no more than a few thousand. This consideration suggests 
that the essential logical complexity of the human nervous system is far less 
than the maximum which the number of neurons and synaptic junctions would 


permit. 
IV. The Hebb Model 


A very different model of a complex central nervous system has been 
examined by Hebb (5). (Hebb does not use our present oblique approach, but 
addresses himself directly to the study of actual nervous systems. For this 
discussion, it is convenient, however, to regard his description as that of a 
proposed automaton design.) The chief design aim of this model is minimiza- 
tion of logical complexity. The elements composing his model are given 
many of the properties which the cells of the human nervous system are 
currently thought to have. They may thus be termed neurons less metaphori- 
cally than the elements described above. They have one further property 
which, though it finds some support in neurophysiology, may be regarded 
as an ad hoc assumption. It is assumed that each firing of a neuron is ac- 
companied by the strengthening of the synaptic junctions through which it 
was stimulated. This property, called neurobiotaxis, makes any often- 
repeated chain of neural firings progressively more easily initiated. . 

The Hebb neuron is a much more complex structure than the M. P. 
neuron. It partakes in a few hundred rather than in a few synapses. Its 
firings are determined by a complicated interplay of stimulations, periods 
of continuously varying excitability following a previous firing (the relative 
refractory period); its firings are not subjected to a coarse-grained time 
quantification. Moreover, it has, by reason of the assumed neurobiotaxis, a 
more fine-grained and more retentive memory than the one time unit memory 
of the M. P. neuron. This greater complexity of the Hebb neuron does not 
give added scope to the organizations which can in principle be based on it, 
since its properties can be duplicated with any desired accuracy by structures 
composed of M. P. neurons. The Hebb automaton thus relegates greater 
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logical complexity to its basic element than does the block-diagram model. 

The logical simplicity of the Hebb model lies in the bold assumption 
that the interconnections of the neurons are for the most part not planned. 
The neurons are regarded as produced in vast numbers by a broadcast 
mechanism (e.g., by successive cell division) and to position and interconnect 
themselves in a way which is determined by design only as regards gross 
architectural features. The detailed wiring of the model, in which neurons 
form synaptic junctions with others, is randomly determined. Here and 
in what follows the term ‘“‘random”’ is used in the lay sense—unplanned, 
nondescript, determined by happenstance—rather than in the broader 
mathematical sense. The Hebb picture of the cerebral cortex may be likened 
to the subsoil aspect of a forest. The complex matting of roots is not the 
result of meticulous engineering but of the chance placement of the grains 
of sand, drops of water, etc., which influenced the growth of each of the 
roots. There are architectural features—tree roots by and large go deeper 
than grass roots—but the precise configuration of roots is not subject to 
plan. This is not meant to suggest that the growth of a grass root, or of an 
axon, is exempt from causality, but only that myriad other configurations 
of roots would serve as well to nourish and support the forest. 

The essence of Hebb’s discussion lies in the observation that a large 
random network of neurons must be presumed to include many circular 
(reverberatory) chains, capable of sustained activity when once excited. 
Those reverberations frequently excited by particular combinations or con- 
catenations of stimuli tend to be fixed by neurobiotaxis and may be evoked 
by progressively smaller aliquots of the constellation of stimuli initially 
required. The first-formed elementary reverberations will interact among 
themselves to form higher-order associations and combinations, thus leading 
to a complex hierarchical structure. 

In a network of M. P. neurons provided only with excitatory inter- 
connections, a stimulus can more readily excite an appropriate response 
than suppress other inappropriate responses. [Von Neumann (15) has shown 
that networks of M. P. neurons can be given full logical universality with- 
out the use of inhibitory interconnections. The device used, however, does 
not lend itself to use in a random model.] The Hebb neuron, unlike the M. P., 
displays a significantly long refractory period. This makes possible the 
suppression or inhibition of reverberatory activity by excitatory processes 
alone. If two reverberatory chains share the use of a number of neurons, the 
excitation of one reverberation may, by fatiguing shared neurons, tend to 
suppress activity in the other. Similarly, any strikingly intense, widespread 
excitation of the network, which may be identified with painful stimuli to 
an animal, will tend to break up the current large-scale pattern of activity. 
Hebb’s model looks to this disruption of over-all patterns of activity by 
intense stimulation to effect macroscopic (goal-directed) learning. 
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V. Possibilities of Computational Investigation 


Hebb’s extensive qualitative discussion has the aim of making plausible 
the view that the impressive abilities of a human nervous system are expli- 
cable with this (conceptually) simple set of hypotheses. The above discussion 
is intended as a description of this aim, not as a summary or as a critical 
review of Hebb’s argument. The success of such a plausibility-proof must 
be judged by each reader for himself. I find it profoundly convincing. 

The difficulties which stand in the way of a firm analytic proof of the 
adequacy of the Hebb model are highly formidable. These do not, as might 
first be thought, have primarily to do with the enormously large number of 
chaotically interconnected components. Statistical techniques for investi- 
gating the properties of such assemblages are reasonably well developed. 
The chief difficulties of analysis lie rather in the moderately great complexity 
of the elementary unit and of the gross structural features of the observed 
human nervous system. In order to preserve reasonable verisimilitude, a 
model of the human nervous system might require the description of, say, 
one hundred distinct regions, each with its own statistically described pattern 
of organization. This number is not so small as to permit carrying out, with 
reasonable labor, an analysis which takes each region separately into account, 
nor is it so large—and each region so unimportant—as to permit using a 
statistical description which overrides their distinctions. 

Despite the formidable difficulties confronting an attempt to prove the 
adequacy of the Hebb model as representing the human nervous system, it 
does not seem out of reason to attempt more modest checks of the basic 
features of the model. Considerable simplification of the task can be effected 
by omission from the model of many features which are auxilliary to the 
dramatic and characteristically mammalian behavior patterns. It would 
seem reasonable to omit from a preliminary analysis any representation of the 
neurological components of the homeostatic mechanisms controlling temper- 
ature, pH, etc. Similarly, the model might be divorced from most, if not all, 
of the normal mammalian sensorium and control over musculature. Some 
means must be provided for representing an interaction of the model with 
its environment, but for first analysis it would seem sufficient to provide 
some logically simple (though unphysiological) input and output mechanisms. 
There would likewise seem little reason to provide the model with any wired- 
in interconnections between input and output to provide for the demonstra- 
tion of unlearned reflexive or instinctual behavior. In brief, the model need 
not be required to display any of the aspects of animal behavior for which 
the possibility of mechanical representation is subject to little doubt. 

So stripping the model of lesser requirements may considerably simplify 
checking its ability to simulate intelligence. If carried to completion, how- 
ever, this stripping might prevent the display of any goal-directed learning. 
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A rat wo feels no hunger cannot be expected to display shrewdness in a 
food-goal maze. The simplification of the model should thus stop short of 
the removal! of all affective inputs. It may suffice, however, to leave one 
goal-associated feature of the input which, in the interpretation of the 
behavior of the model, plays the role of a generalized indication of pain 
(or alternatively of pleasure). Each particular environment of the model 
will be specified by a functional dependence of the inputs to the model on 
its (present and prior) outputs, i.e., its experiences are at least in part de- 
termined by its behavior. The extent to which the behavior of the model 
serves to diminish (or, in the alternative case, to increase) the frequency of 
stimulation of the goal-associated input is then a measure of the goal-directed 
learning displayed by the model in each environmental situation. 

Even with these simplifications it is not clear that the present techniques 
of mathematical analysis permit a more penetrating study of the expectable 
performance of stochastic neural network models than that to be found in 
Hebb’s qualitative discussion. A more promising approach would seem to be 
offered by the use of a modern general-purpose digital computer to simulate 
the behavior of specific examples of the model in specific environmental 
situations. Some loss of mathematical rigor is unavoidable in this method 
of study since the performance of a few haphazardly selected examples would 
be taken as typifying the performances of the enormously large ensemble of 
possible realizations of each model, the details of which are randomly speci- 
fied. The danger of being misled by a fluke performance is, however, no 
greater than that which occurs in most stochastic investigations and admits 
the usual statistical safeguards. This technique of investigation of a highly 
multidimensional ensemble by sampling, known as the Monte Carlo Method, 
has been investigated by Ulam, von Neumann, and others (9). In some 
applications this technique proves notably efficient (18). The study of the 
Hebb model and other stochastic neural network models appears to be such 
an application. 


VI. The Three-Layer Model 


Prior to his acquaintance with the Hebb model, the author initiated the 
investigation of a stochastic neural-network model composed of M. P. 
neurons (17). In this three-layer model the neurons are distributed upon a 
surface in three classes (layers) serving particular purposes. One, the trunk 
layer, was to transmit to all parts of the surface notice of the reception 
anywhere of a painful stimulus. Another, the granular layer, was to record 
certain special events by the initiation of spatially localized reverberations. 
These reverberations may be extinguished by the passage of a wave of 
excitation along the trunk layer. They thus provide a pain-limited temporary 
memory of the occurrence of the special events which initiated them. In the 
third, the primary layer, the neurons are interconnected by long-range pro- 
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cesses, unlike those of the trunk layer and granular layer, which have only 
local connections. The simultaneous firing of two neighboring neurons of the 
primary layer constitutes the special event to be recorded in the granular 
layer. A reverberation in the granular layer is to produce a progressive 
increase in the sensitivity of neurons of the primary layer near to the neurons 
which initiated the reverberation. a 

It was hoped that by appropriately specifying the statistical structure 
of its neural interconnections the network could be shown to display proper- 
ties suggestive of learning ability. This learning ability of the model may 
be expected to increase indefinitely (at least in some quantitative sense) as 
the size, but not the logical complexity, of the network is increased, hence 
without increase of the ingenuity invested in its design. 

It proved easy to show the desired properties for the trunk and granular 
layers (4). The trunk layer neurons were densely interconnected and given 
an appreciable refractory period. These parameters could be widely varied, 
still permitting the propagation of a wave of excitation. The neurons were 
arranged in a rectangular array, opposite edges of which were regarded as 
contiguous so as to make the surface topologically of spherical or, more 
conveniently, of toroidal character. In this way disturbances owing to 
atypical characteristics at boundaries were avoided. The wave of excitation 
would spread over the entire surface leaving behind a refractory zone and, 
upon reconverging to a point, be extinguished. 

A few possible structures for the granular layer were examined. The 
performance of the layer was found to be favored by a short refractory 
period and by considerable statistical fluctuation in the degree of local 
connectivity; hence the name granular. This permits the maintenance of 
many independent local reverberations while preventing any long-range 
spreading of excitation. An extreme, perhaps trivial, solution is to give each 
neuron unit threshold and one self-exciting process. 

The study of the properties of the primary layer presented considerably 
greater difficulty and seemed to require the use of calculating equipment 
of greater speed and flexibility than was readily available. It was proposed 
at that time (1949) that an electronic calculator of considerable memory 
capacity be used in the further exploration of a three-layer model, with 
attention focused chiefly on the primary layer. 

In the summer of 1951, through the generosity of the National Research 
Development Corporation and of the Department of Mathematics of the 
University of Manchester, the author had opportunity to initiate this ex- 
ploration with the use of the Manchester Mark I computer. The results of 
the calculations made at that time are described below. Although they are of a 
qualitative and preliminary nature they support the suggestion that this 
technique of investigation is likely to prove fruitful. 














STANLEY FRANKEL 159 


VII. Manchester Calculations 


It was decided, for the sake of computational simplicity, to make no 
attempt to represent explicitly the neurons of the granular and trunk layers 
but to subsume their supposed effects on the thresholds of excitation of the 
primary layer neurons in the rules governing the computations. 

In a first series of calculations planned, a large number of neurons was 
to be represented. Each was to receive excitatory processes from a fixed or 
variable number of others selected by a random process. To avoid over- 
burdening the rapid-access memory capacity of the computer, it proved 
convenient to represent the series of random numbers which describe the 
interconnections of the neurons by an algebraic formula from which they 
could be repeatedly calculated rather than to store the series in the memory. 
These numbers are thus not truly random, but have sufficient complexity to 
be considered quasi-random, i.e., sufficiently disorderly for the purpose. 

In each cycle of the calculation the computer determines and displays 
which of the neurons fire; information for use in the succeeding cycle is 
recorded. The firing of a neuron is determined by the number of neurons, 
among those from which it receives excitatory processes, which fired in the 
preceding cycle. If that number equals or exceeds its assigned threshold 
it fires, otherwise not. Again to avoid overburdening the computer memory 
it was at first planned to take no account of the number of cycles elapsed 
since the neuron last previously fired, i.e., not to assume a refractory period 
exceeding one cycle. Modifications in the behavior of the network were to 
be effected by changes in the thresholds. 

As a preliminary to these experiments a series of calculations was carried 
out to determine suitable ranges of the number or mean number of excitatory 
processes brought to each neuron and the constant or mean threshold. It 
soon became evident that no suitable values of these parameters could be 
found. For any value of the threshold exceeding unity the level of reaction 
was intrinsically unstable. The number of neurons firing either fell to zero 
after a few cycles or rose to almost the full number of neurons represented. 
An elementary statistical calculation shows that this behavior is to be ex- 
pected in the models as tried. 

Two methods of overcoming this difficulty presented themselves. One 
is the use of neurons with definite refractory periods considerably exceeding 
the synaptic delay time, i.e., many cycles of the calculation. This should tend 
to depress the upper stable level of reaction by reason of the refractory 
condition of most of the neurons at each cycle. This procedure seems un- 
attractive, since the firing of a neuron would then depend chiefly on its release 
from inhibition rather than upon the immediately preceding pattern of 
firings. 








160 PSYCHOMETRIKA 


The second method of stabilizing the level of reaction is logically appeal- 
ing though seemingly unphysiological. It is to use only inhibitory rather 
than excitatory interconnections of the neurons. This gives the level of 
reaction negative rather than positive feedback characteristics, thus producing 
a single stable level of reaction. This procedure avoids the necessity of 
maintaining in the computer memory a record of the number of cycles 
elapsed since the last previous firing of each neuron, as would be required if 
long refractory periods were taken into account. It was accordingly decided 
to adopt this latter procedure. 


VIII. The Linear Inhibitory Model 


Having taken this one step away from physiological plausibility, another 
became appealing. The complexity of the computational procedure can be 
considerably reduced by limiting the group of neurons to which each neuron 
may be responsive. Accordingly, it was decided to represent the neurons of 
the primary layer as arranged in a circular sequence and to select the neurons 
sending inhibitory processes to each neuron from among the forty immediate 
predecessor neurons in this sequence. A further computational simplifica- 
tion is afforded by making the firing of each neuron contingent. on the just 
prior computed firings of its predecessors rather than those of the preceding 
cycle. It is to be noted that this simplification is effected at the cost of a 
great reduction in the amount of information in storage at each stage. 

The computational procedure, so simplified, was set up with the follow- 
ing characteristics: Each neuron received inhibitory processes from some 
among its forty predecessors, each of these being included or excluded with 
equal probability. Initially each neuron was given a threshold of inhibition 
of five, i.e., if five or more among its twenty-odd selected predecessors had 
fired it did not fire; otherwise it did. Arrangements were provided to replace 
this determination by selected firings of some sensory neurons to represent 
environmental stimuli. This arrangement may be expected to lead to the 
firing of approximately one-fourth of the neurons in each cycle. A cycle now 
becomes simply one traversal of the circular array of neurons from an 
arbitrary starting point. Provision was also made for the increase by unity 
of the thresholds of all neurons which had fired in the preceding cycle, at 
the choice of the operator. He was thus enabled to simulate the assumed 
effect of the granular and trunk layers in rewarding a successful performance. 

This linear inhibitory model performed qualitatively as expected on its 
initial cycles. The number of neurons firing per cycle remained approximately 
constant. No pattern was readily visible in the change from cycle to cycle 
in the set of neurons fired. Thus, prior to any learning experiment the activity 
of the model seemed substantially random. It is to be noted, however, that 
the state of the model at any moment is specified by only forty binary 
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alternatives—the firing or not of the forty preceding neurons. Taking into 
account the fact that in the mean only one-fourth of the neurons fire this 
represents a store of only 3314 bits of information. It is therefore to be 
expected that the appearance of randomness of the firing pattern is not 
deep-seated. 

As a first simple learning experiment it was decided to promote the 
firing of a particular neuron, making no use of the input mechanism repre- 
senting environmental stimuli. A neuron was chosen which, prior to the 
learning experiment, fired in very nearly one-fourth of the cycles. Its firing 
was taken as the criterion of a successful cycle. In the learning experiment 
the model was run as initially set up until the first cycle in which the selected 
neuron fired. The model was then rewarded as described above, i.e., each 
neuron which had fired in that cycle had its threshold raised to six. On 
superficial examination the experiment seemed strikingly successful; there- 
after the selected neuron fired in every cycle, although no further reward was 
supplied! On closer examination, however, it appeared that the structuring 
of the firing pattern resulting from the reward was excessive. After the 
reward the model fell into an immutable pattern, each neuron either fired 
in every cycle or in none. This fixed pattern was similar to that which occurred 
in the rewarded cycle but not identical. Thus, if the selected neuron had 
been one for which the two patterns differed, the experiment would have 
seemed totally unsuccessful; the model would have learned the opposite 
from the intended behavior. 

The result of this experiment is unsatisfactory in another way. Since 
the firing pattern was fixed by the first reward, the model was not capable 
of further learning. A milder form of reward would presumably have been 
preferable in diminishing the likelihood of fixing the wrong pattern of behavior. 
It is also clear that a satisfactory model requires a much greater stock of 
randomness in its initial behavior. 

This result suggests that a learning mechanism may need to be guarded 
against excessively rapid learning, which could lead to its leaping to un- 
justified conclusions. The optimum learning rate would seem to be determined 
by the opposing hazards of accidental learning, brought about by statistical 
fluctuations and accidental correlations of actually unrelated things, and 
the dangers (depending on the environmental circumstances) of learning 
too slow. It is interesting to speculate on the effect of the considerable 
increase in the prevalent life span which the human species has experienced 
in the course of its last few thousand generations. It seems possible that 
this has made more prevalent the defect of human intelligence that arises 
from too-rapid learning. It may appear that the typical mentally maladapted 
individual has not learned too little but rather too much that is not true. 

It has not as yet proved possible to test the behavior of this model as 
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modified in the ways suggested by this result. It is the author’s hope that 
the suggestion of success shown by this model will serve to stimulate similar 
investigations in some laboratories having electronic computing machines. 
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BOOK REVIEWS 


J. P. Guttrorp. Psychometric Methods. (2nd Ed.) New York: McGraw-Hill, 1954, pp. 
ix + 597. 


Psychometric Methods has been renovated and enlarged for its second edition. Although 
it retains the character of its 18-year old predecessor, the new edition includes some major 
changes. An introductory chapter on measurement theory has been added. Most of the 
statistical topics have been removed, including the old chapters on simple and multiple 
correlation. The treatment of psychophysics and scaling has been improved by forming 
chapters on psychophysical theory and on principles of judgment from material formerly 
interspersed in the descriptions of the specific methods. Psychological testing now occupies 
three chapters instead of one. Finally, the problems at the end of each chapter have been 
revised and are accompanied by answers whenever practicable. 

The book begins with a lucid account of the logical basis of psychological measure- 
ment, and a discussion of nominal, ordinal, interval, and ratio scales. This is followed by a 
comparison of the classical psychophysics of Weber ratios, difference limens, and Fechner’s 
law, with the modern psychophysics of discriminal dispersions, the law of comparative 
judgment, and stimulus-response matrices. (S is used for stimulus and F for response, 
instead of the awkward reversal perpetrated by the Germans.) The third chapter covers 
mathematical functions, curve fitting, and probability distributions. The major psycho- 
physical methods and scaling methods are covered in the next seven chapters. Included 
are the methods of average error, minimal changes, constants, pair comparisions (Guilford 
prefers pair to paired), rank order, equal sense distances (bisection), equal-appearing 
intervals, fractionation, constant sums, and successive categories. Experimental designs 
and computational procedures are illustrated for each method, the pros and cons are dis- 
cussed, and variants of the methods are noted in short paragraphs. Short sections are also 
devoted to allied problems, including multidimensional scaling, the objectivity of judg- 
ments, and the prediction of first choices. The scaling material concludes with chapters on 
rating scales and principles of judgment. The latter includes discussions of judgment times, 
the time-order error, anchoring, judgment sets, regression phenomena, and Helson’s concept 
of adaptation level. 

The field of testing requires three long chapters. The discussion of test theory includes 
a detailed account of the theory based on independent true and error scores, and brief 
mention of some new additions or alternatives proposed by Lord, Loevinger, Ferguson, 
and others. Speed and power problems and scoring problems are discussed. Reliability, 
validity, and item analysis are treated at length. A brief account is given of attitude scale 
construction. The final chapter is devoted to factor analysis. It includes discussion of general 
issues in factor analysis, and provides detailed recipes for centroid factoring and graphic 
rotation of axes. 

It is not possible to encompass all of present-day psychometrics in a single volume. 
However, Guilford has managed to include most of the popular techniques, and has provided 
extensive references for those who want supplementary information. In the areas of psycho- 
physics and scaling, where references are scattered and reviews are few, Guilford’s treat- 
ment is more thorough than in the areas of testing and factor analysis, where other good 
summaries are available. There was obviously not space enough to include all of the new 
techniques and ideas. The up-and-down method, probit analysis, and Coombs’ general 
approach to scaling all deserve more extended treatment than they receive. In general, 
though, Guilford’s coverage is excellent. 
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Guilford’s treatment of the method of successive categories is the weakest section in 
the book. In contrast to the usual clarity of exposition, this section is fuzzy and very difficult 
to follow. (There are some printing errors to add to the confusion). Some of the details are 
right, others are wrong, or at least dubious. His method for estimating category boundaries— 
or limens—is standard, but then he suggests locating the stimuli by finding the interpolated 
medians of the judgment distributions on this scale of limens. According to him, means are 
harder to find, and in either case, trouble arises when judgment distributions are truncated, 
i.e., when many judgments are in an extreme category. Actually, when appropriate pro- 
cedures are used, the stimulus means are easy to determine, and the method is indifferent 
to truncation. The basic difficulty with the presentation is that the successive categories 
model is never stated explicitly. In fact, Guilford seems to reject the model when he argues 
that the categories themselves should somehow be scaled rather than the boundaries 
between categories. His procedure for scaling the categories makes no sense to this reviewer. 
Since the method of successive categories has great utility, this section of the book is 
especially disappointing. 

Guilford’s exposition is usually very clear, and his style is straightforward. The 
clarity is slightly compromised by the fact that = never appears with an index of summation 
or limits. This is sometimes confusing, although seldom ambiguous. The text includes 
many examples that help the reader to follow the development. However, it would have 
been better strategy to draw more psychophysical examples from sensory psychology. 
Emphasis on problems like discrimination, masking, and target detection in vision and 
audition, rather than on lifted weights might interest students who now find psychophysics 
dull. 

In a book of this sort, it is sometimes necessary to introduce formulas magically, 
either because there is not space to derive the formulas, or because the development would 
be beyond the mathematical abilities of most students. In Psychometric Methods magic is 
used quite frequently. There are several places where a few words of explanation would 
reveal the trick and allow the student to understand the development or at least to get an 
intuitive grasp of the idea. Some examples will be cited here. In describing scoring formulas 
for tests, Guilford states that R — W/(k — 1) is perfectly correlated with R + B/k. If the 
student realizes that rights plus wrongs plus blanks equals the total number of items on the 
test, he can easily prove the assertion: the insight should have been supplied in the text. 
Fitting a straight line by the method of averages is very simple to comprehend if the student 
realizes that he is really selecting two subgroups of points and drawing a straight line 
through the means of these subgroups. However, Guilford’s purely algebraic discussion is 
likely to seem magical. 

Another example is provided by Tucker’s version of Kuder-Richardson formula 20, 
which is written so that a priori estimates of the variance of the item p’s can be inserted. 
It is not stated in the text that Tucker’s formula is algebraically equivalent to K. R. 20. 
Indeed, the text implies a lack of equivalence. A slightly different case is the formula for 
estimating discriminal dispersions in Case III of the law of comparative judgment, which 
is presented uncritically. The inquisitive student will discover that the approximation is 
based on some very tenuous assumptions and may not be very approximate. The student 
should have been warned that this one is done with mirrors. 

The book has its share of errors. Most of these are in the numerical examples, the 
table headings, and in the formulas. Three are worth noting here. In formula (10.7), p. 253, 
the S? should be preceded by 2. In formula (14.17), p. 386, the = in the denominator should 
be inside the parentheses. In formula (16.1), p. 472, the radicals indicating the fourth root 
of the numerator and the square root of the denominator are omitted. Most of the other 
errors that we noticed will not bother an alert self-confident student. 

The appendix contains a fine collection of useful tables. Table C, which gives normal 























BOOK REVIEWS 165 


deviates and ordinates for various values of the area, is especially valuable and can be 
found almost nowhere else. 

The major changes in Psychometric Methods are in scope and organization. The reader’s 
evaluation of the second edition can be predicted accurately from his estimate of its pre- 
decessor. 


Massachusetts Institute of Technology Bert F. Green 


C. RaDHAKRISHNA Rao. Advanced Statistical Methods in Biometric Research. New York: 
John Wiley and Sons, 1952, pp. xvii + 390. 


This book should be of much interest to social scientists and other investigators who 
are so often confronted with data requiring multivariate analysis of one kind or another. 
The style, which presupposes a working knowledge of elementary statistics, is a combination 
of terse mathematical statement followed by examples, mostly from the fields of anthro- 
pometry and genetics. A psychological application (p. 316, p. 370) is of importance, for it 
illustrates the solution to the problem of ‘‘types,” under the restriction that measurement 
dispersion matrices for the types are identical. 

The first chapter neatly summarizes that part of matrix algebra most useful in sta- 
tistics, including quadratic forms. Also, the technique of pivotal condensation for evaluating 
determinants and matrix inverses is first discussed here, and throughout the book the 
value of this method for simplifying computations is ably demonstrated. 

The second chapter gives statistical distributions in common usage, followed by the 
multivariate distributions required for tests treated in subsequent chapters. Some practical 
insight into the use of distributions for constructing multivariate tests is provided by this 
chapter and Chapter 7. 

The remaining chapters are oriented toward testing hypotheses, with adequate 
emphasis on cases where variates are correlated. The last three chapters contain, with 
only minor revisions, the author’s previously published contributions to the theory and use 
of classification, or discrimination, functions. The value of this work for psychologists and 
anthropologists can hardly be overemphasized. 

Chapter 4 contains an interesting and original presentation of maximum likelihood 
estimation, where the Fisherian concepts of efficient scoring and amount of information in 
scores are illustrated. Chapters 3-6 contain useful sections on many of the traditional 
problems of inferential statistics. Analysis of variance is discussed only briefly, but a new 
technique for obtaining an interaction sum of squares is given, and a problem requiring 
classical analysis of covariance is fully illustrated. (Generalized analysis of variance and 
covariance, or “analysis of dispersion,” is treated in Chapter 7.). The sections on chi-square 
are clear and relatively complete; for example, included is the evaluation of 2 X 2 tables 
with more than one degree of freedom, the use of Dandekar’s (instead of Yates’) correction, 
and a more exact approximation to the normal distribution than that obtained by using 
V/2x2 — +/2n — 1. An equation on page 197 is incorrect, but in the context the slip is 
obvious to the reader. 

Several appendices are included, one of which contains a number of original lemmas 
on classificatory problems; another contains two methods for applying a Schmidt trans- 
formation to obtain uncorrelated variates. 

The main disadvantage of the book lies in the fact that most readers who will want 
to use the methods may find it difficult to make rather abrupt transitions from very general 
mathematical thinking to concrete applications. In other words, the book may be too 
difficult for those for whom the applications seem most pertinent. Also, psychologists may 
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be disappointed that the final chapter contains so little on factor analysis. The author 
makes use of canonical variates and correlations without clearly relating them to Hotelling’s 
method of factor analysis. But such criticisms are unimportant in view of the many re- 
markable contributions so adequately and creatively utilized in this volume. 


Mellon Foundation 
Vassar College 


Hazold Webster 


Raymonp B. Catre.u. Factor Analysis: An Introduction and Manual for the Psychologist 

and Social Scientist. New York: Harper & Bros., 1952. Pp. xiii + 462. 

In the Preface of Factor Analysis, Cattell has set forth three principal requirements 
which the book should fulfill. (1) “— to meet the need of the general student in science to 
gain ideas of what factor analysis is about and to understand how it integrates with scientific 
methods and concepts generally,” (2) to serve “as a textbook for statistics courses which 
deal with factor analysis for the first time, either as an appreciable part or as the whole 
of the semester course,” and (3) ‘“—to supply a handbook for the research worker, the 
student, and the statistical clerk which will be a practical guide with respect to carrying 
out the processes most frequently in use.” To achieve these three objectives Cattell has 
written the book in three sections: I Basic Concepts in Factor Analysis, II Specific Aims 
and Working Methods, and III General Principles and Problems. Each of these sections 
has been planned in terms of a sound psychological approach to teaching in which the 
reader is carried from simple concepts to more complex ones, from general principles to 
specific items, and from elementary numerical examples to illustrative problems involving 
numerous intricate steps. 

It is helpful to evaluate the contribution of the book in terms of the extent to which 
it appears that the author has attained each of the three requirements set forth. Although 
giving the impression of being somewhat missionary in his remarks concerning the scope 
of the applications of factor analysis, Cattell has done well in explaining at a readily- 
grasped intuitive level the basic principles underlying factor analysis and in stating the 
numerous uses to which the factor analytic techniques can be put. One of the strongest 
features of the text is the thorough and penetrating discussion of the place of factor analysis 
in the design of experiments. In addition to explaining at length the characteristics and 
application of the R and Q, P and O, and S and T techniques, the author has attempted 
to relate these six procedures to features of the classical experimental design and to modern 
approaches involving use of analysis of variance in factorial designs. Moreover, he has 
shown explicitly the potentialities of factor analysis not only in theory construction, but 
also in applied fields of educational and social psychology. Not the least interesting of his 
achievements is the discussion of the nature of the interpretation of factors that appear in 
an analysis. In short, the first objective has been achieved in a noteworthy fashion. 

Since the realization of the second objective concerning the textbook function actually 
depends upon the fulfillment of the third objective relating to the handbook service of the 
book, it is advantageous next to evaluate the degree to which the third objective has been 
attained. As a manual numerous shortcomings are apparent: 

(1) It would appear that an attempt has been made to explain too many methods of 
centroid extraction, communality estimation, and factor extraction relative to the limited 
space devoted to, those topics. A somewhat more extensive explanation of a fewer number of 


techniques might have been desirable. 
(2) The steps involved in the various clustering methods do not seem to be easy for 
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the beginner to grasp, since the illustrative examples are not clearly related to the procedures 
described. For example, the explanation of the group method of factoring (pp. 174-8) 
seems to be unnecessarily confusing and ambiguous. The origin of the entries appearing 
in the table at the top of page 176 remains a mystery to the reviewer. 

(3) The format of the computational explanations is such that one cannot grasp in a 
readily-apparent fashion the objectives toward which the writer is trying to lead the reader. 
Paragraph captions or headings would be particularly helpful. In short, each of the steps 
involved in the calculations is simply not clearly set forth for the reader to perceive. Each 
rule or procedural item should be directly related to a specific numerical operation. 

(4) The explanation concerning the rotation process through use of graphs is sub- 
stantially inadequate if the text is to serve as a manual. What is seriously needed is a set 
of graphs to illustrate in a step-by-step fashion the solution of a representative problem 
involving between 10 and 20 test variables. In addition, a paragraph or two in which an 
explanation is given as to why each rotation was undertaken would be most helpful to 
the beginning student. Both orthogonal and oblique rotations should be considered at 
much greater length. Although mastery of the art of rotation requires extensive experience, 
a list of guiding principles that are related to illustrative plots would constitute an important 
teaching aid. 

(5) The presence of numerous errors is particularly annoying and confusing to both 
the beginner and the experienced worker. One rather serious mistake occurs in the equation 
near the top of page 232. Instead of F, or Vr = VoAF the equation should be written as 
F, or Vp = VoAF-1. There is some doubt as to whether the geometric interpretation of 
reflection in centroid extraction that is presented on page 54 is correct. Numerous minor 
errors are present. A few examples may be cited: a double negative on page 157, line 9, 
which would not seem to be intended; one numerical entry of 0.37 in line 3 of the second 
paragraph on page 160 when the value of 0.38 is intended; misplacement of the decimal 
point of the numerical entry in the denominator of the fraction appearing at the bottom of 
page 160; an incorrect numerical entry (4.45 instead of 4.72) in the denominator of the 
fractions from which m is calculated on page 172; an apparently erroneous value of .10 
instead of about .15 in the second row and first column of Table 26 on page 201; the use of 
communality when square root of communality is intended on line 17 of page 205; the 
use of ‘‘are” when “‘is’”’ is intended on the fifth line from the bottom of page 256; an in- 
correct reference to Table 27 on page 214, and least important the misspelling of the 
reviewer’s name. 

(6) Much needed is a summary in one location of the matrix equations that are 
frequently employed in factor-analysis studies—a set of 12 or 15 equations that show 
various interrelationships among the primary-factor, reference-factor, arbitrary-orthogonal- 
factor loadings, the intercorrelations of the factors (both types), and the relationship of 
correlation coefficients to various types of factor loadings. Such a summary would serve 
to unify much of the illustrative material. 

These remarks represent to a large extent a consensus based on the numerous state- 
ments of students who have used the book as a text and upon the comments of professors 
who have either required the book in courses or have attempted to use it as a manual in 
their own research. In short, the third requirement has not been realized. 

Since the book as a manual is somewhat limited in the clarity of its exposition with 
respect to the use of numerical procedures, the second requirement concerning its function 
as a textbook has not been met to an adequate degree. It would appear that the instructor 
in factor analysis would need to require a second text to supplement the content of Cattell’s 
Factor Analysis if skills in factoring are to be gained. Students have consistently reported 
that it has been necessary to consult other sources at length to clarify what are essentially 
routine steps involved in clerical procedures. 
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One of the most pleasing features of the book is Cattell’s style of writing, which is 
informal and conversational in its tone. His ample use of cleverly devised figures of speech 
such as similies, personifications, and metaphors offers many an opportunity for a smile 
as well as a refreshing change of perspective in the reader’s orientation to the field of ab- 
stractions that pour forth page after page. A few examples may be definitive: 

“This business of reflecting, however, can become as exasperating as trying to hold 
three footballs in two hands; for as we make r’s positive as a whole for one variable, we 
make some individual r’s in the column negative for other tests (p. 55). 

“The search for common characteristics in the loaded variables which would give a 
first hunch as to the nature of the factor is beset by difficulties when the loadings are not 
very high, and always presents possibilities of being misleading. To take a trivial, not to 
say frivolous, example, if two drunken men and two sober men constituted our population 
and one of the former had had Scotch and soda while the other had had Bourbon and soda, 
but the sober men had had nothing we should obtain correlations suggesting a cluster or 
factor in which drunkenness and soda would be most strongly loaded. Only a person who 
knew that the variables—Bourbon and Scotch—contained the common influence alcohol 
would recognize the role of alcohol in the drunkenness syndrome; and only the choice of a 
sufficiently varied population to include persons who had drunk soda but not alcohol 
would reduce the soda variable to its proper negligible loading in the drunkenness factor 
(pp. 75-76). 

“and it has frequently happened that a reference vector which has obstinately 
eluded stabilization has been led to a recognizable hyperplane by this method as soon as 
all its fellow reference vectors have become sufficiently convincing in their hyperplanes to 
apply it (p. 213). 

“The points are gradually being tracked down by these successive moves and 
shepherded into a restricted area as are sheep by a well-trained sheep dog”’ (p. 54). 

In its current form the book is an excellent source for the person interested in the 
general principles of factor analysis, in the place of factor analysis in experimental design, 
and in types of problems in the social sciences for which factor analysis may be useful. 
However, as a guide or manual to be employed in the actual performance of a factor analysis 
the book is of doubtful value. Despite the limitations as a manual, it would be a useful 
supplementary text in beginning courses in factor analysis. 


University of Southern California William B. Michael 


Kit of Selected Tests for Reference Aptitude and Achievement Factors. Educational Testing 
Service, Princeton, New Jersey: October, 1954. 


This kit contains three reference tests for each of sixteen aptitude and achievement 
factors, as well as a manual giving detailed information about these tests. The purpose of 
the kit is stated as follows by French: ‘‘Tests in this kit are suggested for use in factorial 
studies where representation is desired for any of the . . . aptitude or achievement factors 
(included) . . . It is intended that use of the Kit tests for defining reference factors will 
facilitate interpretation and the confident comparison of one factor study with another.” 
The factors to be included were chosen and the tests selected by a variety of overlapping 
committees, whose work was mainly done by correspondence. Tests and manual are con- 
tained in a strong folder which should help in keeping them together in one place. 

While the work of assembly was obviously well done, a number of disturbing thoughts 
will occur to the reader. Is the democratic process of committee discussion really well 
adapted to the production of scientific truth? Would physicists consent with equanimity 
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to have elements defined in terms of voting and a show of hands rather than in terms of 
decisive proof? Would the inclusion of non-American writers (Vernon, Meili) have decisively 
altered the contents of this kit? These are important problems, but they are not discussed 
in the manual. Perhaps the employment of some such technique as Ahmavaara’s 
(Ahmavaara, Y. Transformation analysis of factorial data. Helsinki: Suomalaisen Tiedeaka- 
temian. 1954) might have helped to objectify judgments. 

Another difficulty will strike many readers. Some of the factors possess a high degree 
of generality, such as the general reasoning factor; others, like the aiming factor, are very 
specific indeed. To put both types, as well as factors of intermediate coverage, into the 
same kit raises problems of specificity and generality which, again, are not discussed in the 
manual. Nor is there a discussion of the very meaning of the term “aptitude” used to 
designate these different types of factors. In the reviewer’s department, tests with high 
loadings on the “‘aiming” factor have been found to be excellent measures of temperamental 
abnormality; to what extent can we rest content with having them treated as pure ability 
tests? 

Another difficulty that arises is due to the failure of the committees to consider 
evidence from outside the factorial field. To take but two simple examples, we may wonder 
to what extent reactive inhibition (Zz) and conditional inhibition (sZz) play a part in the 
performance of dotting, aiming, and tracing tests. To what extent, also, do individual 
differences in inhibition formation determine score? On a rather higher plane we may ask 
about the determination of the results on all the tests included of orectic factors, which 
have been shown (Furneaux, W. D. Some speed, error, and difficulty relationships within 
a problem-solving situation. Nature, 1952, 170, 37) to exert an important and variable 
influence. The fact that none of these objections are discussed or met by French is less 
his fault than that of factor analysts, who in general tend to pay little attention to the 
findings of general psychology in their work. Nevertheless, it does tend to make this collec- 
tion less valuable than it might otherwise have been. It also gives a false feeling of security 
to investigators who may wish to work in this field. 

This brings us to the last point. The kit apparently is intended for research workers 
who may wish to use these tests as reference markers. It is difficult to see why in this case 
the actual tests have been included with the manual. Research workers in any case will 
have to obtain sets of tests which they wanted to employ, and they would certainly be 
expected to be familiar with the current literature and the tests included in the kit if the 
research to be done were to be taken seriously. For the purpose of ensuring the use of 
reference markers, therefore, the manual itself would have been quite sufficient. It seems 
to the reviewer that the main use of the kit will be, not for research workers, but for in- 
structors who wish to show their students illustrative tests of the main factors isolated 
by factor analysts. For this purpose the kit is admirably selected and constructed; it is 
to be hoped, however, that in his discussion the instructor will not forget to include some 
of the matters raised in the first few paragraphs of this review. 


Department of Psychology H. J. Eysenck 
Institute of Psychiatry 
University of London 











